10,000 monthly visitors, apparently

I created davetang.org on the 24th of April 2009 just for the sake of buying a domain with my name in it. Realising that I was and am paying for a service, I decided to actually make use of my web space. But it really started to become handy when I decided to pursue a PhD in April 2010; initially I just dumped everything I was learning onto my wiki and this blog. 4 years after creating davetang.org, I'm getting ~10,000 unique visitors to my domain each month according to AWStats, which my web hosting company has conveniently provided.

visitors-01The dip around April, May and June in 2012 was due to the removal of AWStats by my web host. I guess enough people complained, so they brought it back. And if this trend continues, I will need to pay for more bandwidth.

I'm guessing the increase in visitors each month is due to the growing interest in RNA sequencing. Using Google Trends (formerly known as Google insight), we observe the growing web interest into the term RNA-Seq:

Of course I am doing a much better job (in my opinion) of writing better and more useful posts. To celebrate, I changed the header image to a photo I took while playing around with long exposure times. I took the photo near the Tokyo station in Japan; it looks cool so I thought I'll make it the header. I also changed the background to black because it's easier on the eyes.

Setting up Windows for bioinformatics

Updated 2014 May 20th: I created this post on the 19th of August in 2011 when I just purchased yet another computer (Core i7 2600, 8 gigs ram and a GTX460) and was setting it up for doing some bioinformatics. I updated this page again around mid 2013 when I bought another laptop (Asus N56V, Core i7 3630QM with 12 gigs ram) and was setting it up for work again. I'm updating this page again because I'm seeing some increase in traffic to this page (and thankfully not because I had purchased another computer!).

I use Windows on all of my computers. Using just Windows for bioinformatics is not impossible but it's really just easier to have access to a Linux operating system. In the case of my desktop PC, I have a dual boot setup (Ubuntu and Windows 8) and for my laptop, which came pre-installed with Windows 8 making it a pain to setup Ubuntu, I use VirtualBox to have access to Linux.

Each time I set up a new computer, I always install the list of programs below:

Putty: SSH client
Xming X Server: X Window System Server for Windows
WordWeb: handy dictionary program, which looks up any word you highlight
Launchy: a keystroke program, for quick access to programs
7zip: general purpose zip program
R for Windows
RStudio
Opera: still one of my favourite web browsers and email client
Avast: antivirus
ActivePerl
Cygwin: Linux emulator on Windows
Dropbox: cloud file sharing program
VirtualBox: virtualisation software

My Linux distribution of choice is Ubuntu and using VirtualBox you can have Ubuntu installed inside your Windows installation. Below I outline a list of must have Linux bioinformatic tools for those working in the field of genomics and transcriptomics.

Ubuntu

Download Ubuntu (I would recommend Ubuntu 12.04 LTS). After installing VirtualBox and Ubuntu, here's what I installed immediately:

#VirtualBox guest additions:
sudo ./VBoxLinuxAdditions.run

#zlib (for bwa)
sudo apt-get install zlib*

#download bwa: http://sourceforge.net/projects/bio-bwa/files/
tar -xjf bwa-0.7.5a.tar.bz2
cd bwa-0.7.5a/
make

#install ncurses (for samtools)
sudo apt-get install ncurses-dev

#download SAMTools: http://sourceforge.net/projects/samtools/files/
tar -xjf samtools-0.1.19.tar.bz2
cd samtools-0.1.19/
make

#for BEDTools2
sudo apt-get install build-essential g++
git clone https://github.com/arq5x/bedtools2.git
cd bedtools2
make clean; make all

#FASTX-Toolkit: http://hannonlab.cshl.edu/fastx_toolkit/download.html
wget http://hannonlab.cshl.edu/fastx_toolkit/libgtextutils-0.6.1.tar.bz2
tar -xjf libgtextutils-0.6.1.tar.bz2 
cd libgtextutils-0.6.1/
./configure
make
make check
sudo make install

wget http://hannonlab.cshl.edu/fastx_toolkit/fastx_toolkit-0.0.13.2.tar.bz2
tar -xjf fastx_toolkit-0.0.13.2.tar.bz2
cd fastx_toolkit-0.0.13.2/
./configure
make
sudo make install

#git
sudo apt-get install git-core

For sharing folders click on between VirtualBox and Windows, use Devices/Shared Folders. Afterwards, add your user to the vboxsf group:

sudo adduser `whoami` vboxsf

See also my post on installing R on Ubuntu.

Others

#download your favourite genome
#hg19 for me
wget -O hg19.tar.gz http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
rm *random.fa
rm chrUn_gl0002*
rm *hap*.fa
for file in `ls *.fa | sort -k1V`; do echo $file; cat $file >> hg19.fa; done
rm chr*.fa

#download blat
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/blat/blat

Conclusions

Most personal computers available these days are powerful enough to be running a virtualised installation of an operating system. In my humble opinion, if you're setting up Windows for bioinformatics, the easiest thing to do is just to install VirtualBox and Ubuntu, and installing the bioinformatic programs in that Ubuntu instance.