Say you have a tab delimited file called tally.tsv with n rows and you only want to work with a subset of n based on the sum of each row.
Here’s how to do it within R:
#your tsv file has a header row and the identifiers for each row are in the first column data <- read.table("tally.tsv", header=TRUE,row.names=1) #sum up each row rs <- rowSums(data) #if you are only interested in rows where the sum is greater than 50 use <- (rs > 50) #check to see how many satisfy your condition table(use) #use #FALSE TRUE #12331 22622 #store the rows you are interested in another variable data_subset <- data[use,] #check to see if the number of rows is equal to 22622 nrow(data_subset) # 22622 #to use two conditions #use <- (rs > 4000 & rs < 1000000)
This work is licensed under a Creative Commons
Attribution 4.0 International License.