Say you have a tab delimited file called tally.tsv with n rows and you only want to work with a subset of n based on the sum of each row.
Here's how to do it within R:
#your tsv file has a header row and the identifiers for each row are in the first column data <- read.table("tally.tsv", header=TRUE,row.names=1) #sum up each row rs <- rowSums(data) #if you are only interested in rows where the sum is greater than 50 use <- (rs > 50) #check to see how many satisfy your condition table(use) #use #FALSE TRUE #12331 22622 #store the rows you are interested in another variable data_subset <- data[use,] #check to see if the number of rows is equal to 22622 nrow(data_subset) #[1] 22622 #to use two conditions #use <- (rs > 4000 & rs < 1000000)

This work is licensed under a Creative Commons
Attribution 4.0 International License.
Thanks, it helps me a lot !!!
No problems! I clicked on your link and saw that you work on TEs; cool 🙂
Hey ! Thanks for watching 🙂 I’m doing population genetics using TEs, they are amazing ! Are you working on those “guys” too ?
For one of my projects, I’m looking at transcriptional events from TEs. It’s interesting 🙂