Setting up Windows for bioinformatics

Please refer to Setting up Windows for bioinformatics in 2019.

I use Windows on all of my computers. Using just Windows for bioinformatics is not impossible but it's really just easier to have access to a Linux operating system. In the case of my desktop PC, I have a dual boot setup (Ubuntu and Windows 8) and for my laptop, which came pre-installed with Windows 8 making it a pain to setup Ubuntu, I use VirtualBox to have access to Linux.

Each time I set up a new computer, I always install the list of programs below:

Putty: SSH client
Xming X Server: X Window System Server for Windows
WordWeb: handy dictionary program, which looks up any word you highlight
Launchy: a keystroke program, for quick access to programs
7zip: general purpose zip program
R for Windows
RStudio
Opera: still one of my favourite web browsers and email client
Avast: antivirus
ActivePerl
Cygwin: Linux emulator on Windows
Dropbox: cloud file sharing program
VirtualBox: virtualisation software

My Linux distribution of choice is Ubuntu and using VirtualBox you can have Ubuntu installed inside your Windows installation. Below I outline a list of must have Linux bioinformatic tools for those working in the field of genomics and transcriptomics.

Ubuntu

Download Ubuntu (I would recommend Ubuntu 12.04 LTS). After installing VirtualBox and Ubuntu, here's what I installed immediately:

#VirtualBox guest additions:
sudo ./VBoxLinuxAdditions.run

#zlib (for bwa)
sudo apt-get install zlib*

#download bwa: http://sourceforge.net/projects/bio-bwa/files/
tar -xjf bwa-0.7.5a.tar.bz2
cd bwa-0.7.5a/
make

#install ncurses (for samtools)
sudo apt-get install ncurses-dev

#download SAMTools: http://sourceforge.net/projects/samtools/files/
tar -xjf samtools-0.1.19.tar.bz2
cd samtools-0.1.19/
make

#for BEDTools2
sudo apt-get install build-essential g++
git clone https://github.com/arq5x/bedtools2.git
cd bedtools2
make clean; make all

#FASTX-Toolkit: http://hannonlab.cshl.edu/fastx_toolkit/download.html
wget http://hannonlab.cshl.edu/fastx_toolkit/libgtextutils-0.6.1.tar.bz2
tar -xjf libgtextutils-0.6.1.tar.bz2 
cd libgtextutils-0.6.1/
./configure
make
make check
sudo make install

wget http://hannonlab.cshl.edu/fastx_toolkit/fastx_toolkit-0.0.13.2.tar.bz2
tar -xjf fastx_toolkit-0.0.13.2.tar.bz2
cd fastx_toolkit-0.0.13.2/
./configure
make
sudo make install

#git
sudo apt-get install git-core

For sharing folders click on between VirtualBox and Windows, use Devices/Shared Folders. Afterwards, add your user to the vboxsf group:

sudo adduser `whoami` vboxsf

See also my post on installing R on Ubuntu.

Others

#download your favourite genome
#hg19 for me
wget -O hg19.tar.gz http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
rm *random.fa
rm chrUn_gl0002*
rm *hap*.fa
for file in `ls *.fa | sort -k1V`; do echo $file; cat $file >> hg19.fa; done
rm chr*.fa

#download blat
wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/blat/blat

Conclusions

Most personal computers available these days are powerful enough to be running a virtualised installation of an operating system. In my humble opinion, if you're setting up Windows for bioinformatics, the easiest thing to do is just to install VirtualBox and Ubuntu, and installing the bioinformatic programs in that Ubuntu instance.

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
4 comments Add yours
  1. Hi Dave,

    Thank you, this is really useful!

    Great blog, btw 🙂

    Cheers,
    Debora

  2. Hi Dave,

    Love the website. I’m just getting into bioinformatics in an amateur capacity and was wondering if your could give me some advice. I have a Windows desktop that is fairly modern, running with an i7 and 32 Gb RAM. When installing linux in a virtualized setting, what sort of performance hit do you usually encounter? Is it possible to make use of large amounts of RAM when running in Virtualbox? I understand that this is an old post, but I’d greatly appreciate some help with this.

    Best wishes

    1. Hi David,

      I think overall it worked well; I don’t remember running into any performance issues. That being said, I never ran any heavy computational jobs on my virtual machines; I always used the compute servers at work for that.

      Have a look into Docker (https://www.docker.com/) too. It’s like a light weight virtual machine that you can work and develop in, and share with people. I’ve been meaning to write a blog post on Docker and bioinformatics but haven’t had time yet.

      Have fun,

      Dave

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.