Setting up Windows for bioinformatics in 2019

This is an update on my original post Setting up Windows for bioinformatics that I wrote in 2011. I had switched over to the Mac operating system (Mac OS X) for work when my HP laptop was replaced with a MacBook Air sometime in 2012. A few years later, I wiped my Windows installation from my home desktop computer and replaced it with Ubuntu. Since then, my interaction with the Windows operating system has been minimal. Recently, I re-installed Windows and have been quite impressed with its usability. This post is on how I set up my Windows desktop for bioinformatics in 2019.

Before I begin I’d just like to briefly share my experience of re-installing Windows because I ran into several problems. Firstly, I no longer have the Windows 7 installation disk that came with my desktop. I had previously purchased a Windows 8 key, so I thought I’ll download an ISO and create my own installation disk. However, I later found out that I had only purchased an “upgrade key”, which meant that I could not use it with a fresh installation. While I no longer have my Windows 7 installation disk, I still have the key since it is on a sticker that is stuck on my desktop. However, the Windows 7 installer did not accept this as a valid key for whatever reason. I was about to give up but later found out that I can install and use Windows 10 without a key. Apparently you can still use Windows 10 with no restrictions (fetch updates, etc.) even without activation; the only downside is that you cannot personalise your settings and there’s a watermark on the bottom right of your screen that reminds you to activate Windows. I still find it incredible and surprising that Microsoft have allowed this. Though if I still had my Windows 8 installation, I could have updated to 10 for free anyway.

One of the main reasons for installing Windows 10 was that I wanted to check out the Ubuntu integration with Windows. I remember reading about it a while ago when it was first introduced and thought this was a smart move by Microsoft; this is very handy for people who still want to use Windows and run bioinformatic tools, which are usually only available on a Linux system. For desktop and laptop computers, Windows is still the dominant operating system. The numbers for Windows are a bit lower for visitors to my blog; for visitors in the last 30 days, 52.9% use Windows, 33.4% use Mac, and 9% use Linux. Hence I thought this post may be of interest to many of my visitors.

Installing Ubuntu on Windows 10 is extremely easy; all you have to do is go to the Microsoft Store and look for Ubuntu. You can follow this guide if you need a bit more information.

Once installed you can start Ubuntu from the search bar. If you highlight text from the Ubuntu window, and right-click, the text is saved to the clipboard.

# the version of Ubuntu
cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.1 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

# all my cores are recognised
cat /proc/cpuinfo | grep processor
processor       : 0
processor       : 1
processor       : 2
processor       : 3
processor       : 4
processor       : 5
processor       : 6
processor       : 7

The Windows filesystem is automatically mounted, so you can access all your Windows files on Ubuntu.

df -h
Filesystem      Size  Used Avail Use% Mounted on
rootfs          923G  224G  699G  25% /
none            923G  224G  699G  25% /dev
none            923G  224G  699G  25% /run
none            923G  224G  699G  25% /run/lock
none            923G  224G  699G  25% /run/shm
none            923G  224G  699G  25% /run/user
C:              923G  224G  699G  25% /mnt/c

You have read/write permissions to the Windows mount in Ubuntu, so you can write output files to the mount. If you wanted to do the opposite and access files from Ubuntu on Windows, you can navigate to this location (replacing Dave, unless you are also Dave):

C:\Users\Dave\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs

The string “79rhkp1fndgsc” may be different on your computer, I’m not sure. I’ve set up a shortcut to this location, so I can easily access Ubuntu files from my Windows File Explorer.

Since this is a fresh installation of Ubuntu it does not come with many utilities and bioinformatic tools; I even had to install unzip.

sudo apt install unzip

For installing bioinformatic tools, I recommend using Conda, which is a packaging tool and installer. Many of the popular bioinformatic tools can be installed using Conda. If you want more information, I have some notes on Conda on my Wiki site. I recommend using Miniconda over Anaconda, since Anaconda comes prepackaged with too many packages. To install Miniconda on Ubuntu, simply download a shell script, run it, and follow the instructions.

wget -c https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Installing BWA is a breeze and you don’t have to worry about compiling the program yourself; all dependencies are taken care of.

conda install -c bioconda bwa

bwa

Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.17-r1188
Contact: Heng Li <lh3@sanger.ac.uk>

Usage:   bwa <command> [options]

Command: index         index sequences in the FASTA format
         mem           BWA-MEM algorithm
         fastmap       identify super-maximal exact matches
         pemerge       merge overlapping paired ends (EXPERIMENTAL)
         aln           gapped/ungapped alignment
         samse         generate alignment (single ended)
         sampe         generate alignment (paired ended)
         bwasw         BWA-SW for long queries

         shm           manage indices in shared memory
         fa2pac        convert FASTA to PAC format
         pac2bwt       generate BWT from PAC
         pac2bwtgen    alternative algorithm for generating BWT
         bwtupdate     update .bwt to the new format
         bwt2sa        generate SA from BWT and Occ

Note: To use BWA, you need to first index the genome with `bwa index'.
      There are three alignment algorithms in BWA: `mem', `bwasw', and
      `aln/samse/sampe'. If you are not sure which to use, try `bwa mem'
      first. Please `man ./bwa.1' for the manual.

Another cool thing about Conda are environments, which are isolated workspaces. Imagine buying a brand new laptop that has nothing installed. You gradually install tools that you used for your analyses and save all your analysis code onto said laptop. After you have finished, you can give that laptop to someone else and they will have the exact environment you used to carry out your work. Conda environments are similiar only that you don’t have to give your computer away. You can simply save the environment that you were working in and share it. Environments are also handy for providing a list of packages that need to be installed for a particular analysis.

For example, I have created an environment file in my learning VCF GitHub repository. If you clone this repository and use Conda to create the environment, you should be able to carry out the same analysis. I had performed the analysis on my MacBook Pro and was able to replicate the results on my Windows 10 machine using Ubuntu.

# create a copy of the code repository
git clone https://github.com/davetang/learning_vcf_file.git

# change directories
cd learning_vcf_file

# create an environment and install the necessary programs
# this may take some time depending on your internet speed
conda env create -f environment.yml

# activate the environment
source activate learning_vcf

# change into the analysis directory
cd analysis

# you will need to download GATK if you want to call variants using GATK
# this step may take some time depending on your internet speed
wget -c https://github.com/broadinstitute/gatk/releases/download/4.1.1.0/gatk-4.1.1.0.zip
unzip gatk-4.1.1.0.zip

# the analysis may take some time depending on your computer
./run.sh

The run.sh script generates some random sequences and calls variants using BCFtools, FreeBayes, and GATK.

Summary

It is definitely much easier to setup a computer using Windows 10 for bioinformatic analyses in 2019! The main reason is due to native support of Ubuntu by Windows. Previously I recommended using VirtualBox, which worked, but was not as straightforward to install and use. With the advent of Conda, installing and managing bioinformatic tools has also become much easier. Another advantage of using Conda is the environment support. I showed an example where I created an environment that I used to call variants from some randomly generated sequences.

One more thing; if you use R, I would highly recommend using RStudio and keeping track of your analyses using R Markdown. Even if you don’t use R, I still recommend using RStudio and R Markdown; I regularly use R Markdown in RStudio to write down technical notes that have nothing to do with R. The Vim keybinding support by RStudio is one major reason.

As a final note, my desktop computer is around 8 years old now and Windows 10 works surprisingly well. My computer starts up in less than a minute; if you have used Windows before, the start up speeds were notoriously bad. Despite mainly using my MacBook Pro for work these days, I can see myself going back to Windows!

Conclusion: use Ubuntu, use Conda, use RStudio, and use R Markdown.

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
2 comments Add yours
  1. Hi Dave,
    Amazing post as always! I have a slightly different question. I have been dual-booting my laptops and workstations with most of my bioinformatics perfomred on Ubuntu and Microsoft office related stuff on Windows. Since Microsoft office is available on macOS, is it wise to just upgrade to a Macbook? How would you compare linux vs macOS for bioinformatics?

    Thanks,
    Jyoti

  2. Great posts! I also found the new windows terminal (preview) very useful. You can find it in the microsoft store for free. I just started trying it, and feel it is even better than the Mac terminal.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.