Creating data subsets in R

Say you have a tab delimited file called tally.tsv with n rows and you only want to work with a subset of n based on the sum of each row.

Here's how to do it within R:

#your tsv file has a header row and the identifiers for each row are in the first column
data <- read.table("tally.tsv", header=TRUE,row.names=1)
#sum up each row
rs <- rowSums(data)
#if you are only interested in rows where the sum is greater than 50
use <- (rs > 50)
#check to see how many satisfy your condition
table(use)
#use
#FALSE  TRUE
#12331 22622
#store the rows you are interested in another variable
data_subset <- data[use,]
#check to see if the number of rows is equal to 22622
nrow(data_subset)
#[1] 22622
#to use two conditions
#use <- (rs > 4000 & rs < 1000000)



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
4 comments Add yours
      1. Hey ! Thanks for watching 🙂 I’m doing population genetics using TEs, they are amazing ! Are you working on those “guys” too ?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.