Sequence composition and random forests

Updated: 2013 November 28th The sequence composition or the nucleotide composition at transcriptional starting sites (TSSs) of mRNAs are biased, i.e. certain nucleotides are preferred. Here I examine the sequence composition at the TSS of the NCBI Reference Sequence Database also known as RefSeq and use random forests to see if it's possible to train...

Continue Reading

Combinations and permutations in R

Time to get another concept under my belt, combinations and permutations. While I'm at it, I will examine combinations and permutations in R. As you may recall from school, a combination does not take into account the order, whereas a permutation does. Using the example from my favourite website as of late, mathsisfun.com: A fruit...

Continue Reading

Creating a coverage plot in R

Disclaimer (2015 August 5th): as pointed out in this comment thread below, this post created a density plot rather than a coverage plot. I have written a new post that uses BEDTools to calculate the coverage and R to produce an actual coverage plot. I've recently discovered GitHub Gist, so for this post I'm going...

Continue Reading

Getting started with Git

Git is a distributed version control and source code management (SCM) system with an emphasis on speed. What's version control? Version control is a system that records changes to a file or a set of files over time so that you can recall specific versions later. Here's an example: check out this tweet and the...

Continue Reading

Handling big data in R

All credit goes to this post, so be sure to check it out! I'm just simply following some of the tips from that post on handling big data in R. For this post, I will use a file that has 17,868,785 rows and 158 columns, which is quite big. Here's the size of this file:

Continue Reading

Calculus

I remember studying calculus in school and there were so many concepts that never clicked. I could solve the equations, find derivatives, work out the area under the curve, etc. but I didn't see the use of calculus, i.e. the application of calculus. I'm revisiting calculus now because I've been taking part in a biostatistics...

Continue Reading