This is an update on my original post Setting up Windows for bioinformatics that I wrote in 2011. I had switched over to the Mac operating system (Mac OS X) for work when my HP laptop was replaced with a MacBook Air sometime in 2012. A few years later, I wiped my Windows installation from my home desktop computer and replaced it with Ubuntu. Since then, my interaction with the Windows operating system has been minimal. Recently, I re-installed Windows and have been quite impressed with its usability. This post is on how I set up my Windows desktop for bioinformatics in 2019.
Before I begin I’d just like to briefly share my experience of re-installing Windows because I ran into several problems. Firstly, I no longer have the Windows 7 installation disk that came with my desktop. I had previously purchased a Windows 8 key, so I thought I’ll download an ISO and create my own installation disk. However, I later found out that I had only purchased an “upgrade key”, which meant that I could not use it with a fresh installation. While I no longer have my Windows 7 installation disk, I still have the key since it is on a sticker that is stuck on my desktop. However, the Windows 7 installer did not accept this as a valid key for whatever reason. I was about to give up but later found out that I can install and use Windows 10 without a key. Apparently you can still use Windows 10 with no restrictions (fetch updates, etc.) even without activation; the only downside is that you cannot personalise your settings and there’s a watermark on the bottom right of your screen that reminds you to activate Windows. I still find it incredible and surprising that Microsoft have allowed this. Though if I still had my Windows 8 installation, I could have updated to 10 for free anyway.
One of the main reasons for installing Windows 10 was that I wanted to check out the Ubuntu integration with Windows. I remember reading about it a while ago when it was first introduced and thought this was a smart move by Microsoft; this is very handy for people who still want to use Windows and run bioinformatic tools, which are usually only available on a Linux system. For desktop and laptop computers, Windows is still the dominant operating system. The numbers for Windows are a bit lower for visitors to my blog; for visitors in the last 30 days, 52.9% use Windows, 33.4% use Mac, and 9% use Linux. Hence I thought this post may be of interest to many of my visitors.
Installing Ubuntu on Windows 10 is extremely easy; all you have to do is go to the Microsoft Store and look for Ubuntu. You can follow this guide if you need a bit more information.
Once installed you can start Ubuntu from the search bar. If you highlight text from the Ubuntu window, and right-click, the text is saved to the clipboard.
# the version of Ubuntu cat /etc/os-release NAME="Ubuntu" VERSION="18.04.1 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.1 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic # all my cores are recognised cat /proc/cpuinfo | grep processor processor : 0 processor : 1 processor : 2 processor : 3 processor : 4 processor : 5 processor : 6 processor : 7
The Windows filesystem is automatically mounted, so you can access all your Windows files on Ubuntu.
df -h Filesystem Size Used Avail Use% Mounted on rootfs 923G 224G 699G 25% / none 923G 224G 699G 25% /dev none 923G 224G 699G 25% /run none 923G 224G 699G 25% /run/lock none 923G 224G 699G 25% /run/shm none 923G 224G 699G 25% /run/user C: 923G 224G 699G 25% /mnt/c
You have read/write permissions to the Windows mount in Ubuntu, so you can write output files to the mount. If you want have a file shared across Windows and Ubuntu and need to preserve the file permission in Ubuntu (such as an SSH private key), you will need to edit or create a wsl.conf file inside /etc.
sudo vi /etc/wsl.conf
Add these two lines inside the conf file.
[automount] options = "metadata"
Finally, restart WSL:
1. Press the Win Key + R, which will bring up a Windows Run box
2. Enter services.msc and hit enter
3. Look for LxssManager and restart this service
The next time you start Ubuntu, file permissions should be persistent.
If you wanted to do the opposite and access files from Ubuntu on Windows, you can navigate to this location (replacing Dave, unless you are also Dave):
The string “79rhkp1fndgsc” may be different on your computer, I’m not sure. I’ve set up a shortcut to this location, so I can easily access Ubuntu files from my Windows File Explorer.
Since this is a fresh installation of Ubuntu it does not come with many utilities and bioinformatic tools; I even had to install unzip.
sudo apt install unzip
For installing bioinformatic tools, I recommend using Conda, which is a packaging tool and installer. Many of the popular bioinformatic tools can be installed using Conda. If you want more information, I have some notes on Conda on my Wiki site. I recommend using Miniconda over Anaconda, since Anaconda comes prepackaged with too many packages. To install Miniconda on Ubuntu, simply download a shell script, run it, and follow the instructions.
wget -c https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh
Installing BWA is a breeze and you don’t have to worry about compiling the program yourself; all dependencies are taken care of.
conda install -c bioconda bwa bwa Program: bwa (alignment via Burrows-Wheeler transformation) Version: 0.7.17-r1188 Contact: Heng Li <firstname.lastname@example.org> Usage: bwa <command> [options] Command: index index sequences in the FASTA format mem BWA-MEM algorithm fastmap identify super-maximal exact matches pemerge merge overlapping paired ends (EXPERIMENTAL) aln gapped/ungapped alignment samse generate alignment (single ended) sampe generate alignment (paired ended) bwasw BWA-SW for long queries shm manage indices in shared memory fa2pac convert FASTA to PAC format pac2bwt generate BWT from PAC pac2bwtgen alternative algorithm for generating BWT bwtupdate update .bwt to the new format bwt2sa generate SA from BWT and Occ Note: To use BWA, you need to first index the genome with `bwa index'. There are three alignment algorithms in BWA: `mem', `bwasw', and `aln/samse/sampe'. If you are not sure which to use, try `bwa mem' first. Please `man ./bwa.1' for the manual.
Another cool thing about Conda are environments, which are isolated workspaces. Imagine buying a brand new laptop that has nothing installed. You gradually install tools that you used for your analyses and save all your analysis code onto said laptop. After you have finished, you can give that laptop to someone else and they will have the exact environment you used to carry out your work. Conda environments are similiar only that you don’t have to give your computer away. You can simply save the environment that you were working in and share it. Environments are also handy for providing a list of packages that need to be installed for a particular analysis.
For example, I have created an environment file in my learning VCF GitHub repository. If you clone this repository and use Conda to create the environment, you should be able to carry out the same analysis. I had performed the analysis on my MacBook Pro and was able to replicate the results on my Windows 10 machine using Ubuntu.
# create a copy of the code repository git clone https://github.com/davetang/learning_vcf_file.git # change directories cd learning_vcf_file # create an environment and install the necessary programs # this may take some time depending on your internet speed conda env create -f environment.yml # activate the environment source activate learning_vcf # change into the analysis directory cd analysis # you will need to download GATK if you want to call variants using GATK # this step may take some time depending on your internet speed wget -c https://github.com/broadinstitute/gatk/releases/download/188.8.131.52/gatk-184.108.40.206.zip unzip gatk-220.127.116.11.zip # the analysis may take some time depending on your computer ./run.sh
The run.sh script generates some random sequences and calls variants using BCFtools, FreeBayes, and GATK.
It is definitely much easier to setup a computer using Windows 10 for bioinformatic analyses in 2019! The main reason is due to native support of Ubuntu by Windows. Previously I recommended using VirtualBox, which worked, but was not as straightforward to install and use. With the advent of Conda, installing and managing bioinformatic tools has also become much easier. Another advantage of using Conda is the environment support. I showed an example where I created an environment that I used to call variants from some randomly generated sequences.
One more thing; if you use R, I would highly recommend using RStudio and keeping track of your analyses using R Markdown. Even if you don’t use R, I still recommend using RStudio and R Markdown; I regularly use R Markdown in RStudio to write down technical notes that have nothing to do with R. The Vim keybinding support by RStudio is one major reason.
As a final note, my desktop computer is around 8 years old now and Windows 10 works surprisingly well. My computer starts up in less than a minute; if you have used Windows before, the start up speeds were notoriously bad. Despite mainly using my MacBook Pro for work these days, I can see myself going back to Windows!
Conclusion: use Ubuntu, use Conda, use RStudio, and use R Markdown.
This work is licensed under a Creative Commons
Attribution 4.0 International License.