A small list of command line tips

Updated: 2014 May 14th; added even more tips

I'm in the middle of writing papers and my thesis, so I've been quite busy. However, I wanted to write a quick blog post as an outlet. So here's a list of random command line tips off the top of my head (GNU bash, version 4.1.2(1)-release); I hope that there's at least one tip in this list you didn't know about beforehand.

The find tool is extremely useful; some uses include:

#in all Perl files
#execute a grep quietly
#and look for human_id
#then report which file contains a match
find . -name '*.pl' -exec grep -q 'human_id' {} \; -print

#find broken symbolic links and delete them
find -L . -type l -delete

Randomly shuffle lines using shuf:

for i in {1..10}; do echo $i; done | shuf
9
10
5
2
8
3
4
6
1
7

Use the -A and -B parameters of grep to print the lines before and after your matched line.

#echo out a bunch of lines
echo -e "3\n2\n1\nA\n1\n2\n3"
3
2
1
A
1
2
3

#show me two lines before and two lines after A
echo -e "3\n2\n1\nA\n1\n2\n3" | grep -B2 -A2 A
2
1
A
1
2

#also use -E with grep for extended regular expressions
#Basic vs Extended Regular Expressions
#       In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the #backslashed versions \?, \+, \{, \|, \(, and \).
#
#       Traditional egrep did not support the { meta-character, and some egrep implementations support \{ instead, so portable #scripts should avoid { in grep -E patterns and should use [{] to match a literal {.
#
#       GNU grep -E attempts to support traditional usage by assuming that { is not special if it would be the start of an invalid #interval specification.  For example, the command grep -E '{1' searches for the two-
#       character string {1 instead of reporting a syntax error in the regular expression.  POSIX.2 allows this behavior as an #extension, but portable scripts should avoid it.

Use watch to run a command every 2 seconds (default is 2 seconds):

#monitor the contents of a folder
#this can be useful for monitoring output files
watch ls -lt

#monitor the statuses of UGE hosts
watch qhost

#other monitoring tips
#use "tail -f" to monitor a file if it is growing
tail -f /var/log/apache/access.log

#check out the last 10 users who have logged into a server
last | head

Use Bash's parameter expansion to split words or refer to a specific letter in a word:

#syntax ${parameter:offset:length}
word=todayisagooddaytodie

#length of word is ${#parameter}
last_letter=`expr ${#word} - 1`
for((i=0; i<${#word}; i++))
do
   printf "%s" "${word:$i:1}"
   if [ $i -eq $last_letter ]
      then printf "\n"
      else printf " "
   fi
done

#or just use sed
echo $word | sed 's/\(.\)/\1 /g'

All new files and directories inherit the parent group.

# now all directories and files created inside /home/dtang
# will have the same user and group as /home/dtang
chmod g+s /home/dtang

Using paste to transform data (see this blog post):

cat test.txt 
#one
#two
#three
#four
#five
#six
#seven
#eight

#transform the 8 lines into a 2 by 4 table
paste - - - - < test.txt
#one     two     three   four
#five    six     seven   eight

#or 4 by 2
paste - - < test.txt
#one     two
#three   four
#five    six
#seven   eight

To make directories and sub-directories that don't exist use mkdir -p:

#this doesn't work
mkdir test/test
mkdir: cannot create directory 'test/test': No such file or directory

#this works
mkdir -p test/test
ls test
test

#and use braces to create three subdirectories at once
#no spaces after the commas!
mkdir -p test/{one,two,three}
ls test
one  three  two

#list only directories
ls -d */

Use bc to do quick calculations. In the past I used expr to do simple arithmetic, however bc is more precise:

#8734 divided by 44
#using expr
expr 8734 / 44
#198

#using bc -l
bc -l<<<8734/44
198.50000000000000000000
#I told you it was more precise than expr

Use readline to navigate around in the command line. For example, I always use ctrl+w to delete one word, ctrl+l to clear the screen, alt+b and alt+f to move back and forward one word, ctrl+a to move to the start of the line (or ctrl+a a if you are using GNU screen), and ctrl+e to move to the end of the line. A word of warning though, ctrl+w is the shortcut to close a tab for Chrome and Opera (my favourite Internet Browsers). Occasionally, I've mistakenly closed tabs by accident because I thought the active window was the terminal. But not to worry, ctrl+shift+t re-opens the last tab closed both for Chrome and Opera.

My favourite GNU application of all time is screen. I highly recommend using screen. Did I mention it was my favourite tool of all time? One cool trick with screen is to use split screens. Use ctrl+a + S for a horizontal split, ctrl+a + | for a vertical split, ctrl+a + X to remove the split window, and ctrl+a + tab to navigate between split windows. If you have a big physical screen, you can have four concurrent mini-screens.

Redirect the standard error stream to standard output using 2>&1. Some programs output their usage to the standard error stream, such as intersectBed. To be able to read the usage, we can redirect STDERR to STDOUT and pipe it to less:

#we can scroll through the usage
intersectBed 2>&1 | less

#combine with grep -A
#to see what the parameter -r is
intersectBed 2>&1 | grep -A2 "\s-r\s"
        -r      Require that the fraction overlap be reciprocal for A and B.
                - In other words, if -f is 0.90 and -r is used, this requires
                  that B overlap 90% of A and A _also_ overlaps 90% of B.

To pipe output from one program to the next as well as saving a copy, use tee:

#save output from 1 to 10 loop using tee
#then stream to Perl, which will add the numbers up
for i in {1..10}; do echo $i; done | tee one_to_ten.txt | perl -nle '$a+=$_; END {print $a}'
55
cat one_to_ten.txt
1
2
3
4
5
6
7
8
9
10

Show tabs in a file:

# -t = -vT = --show-nonprinting + --show-tabs
cat -t file.tsv | grep --color "\^I"

To read and write to the same file, use sponge:

#clone repository
git clone git://git.kitenet.net/moreutils
#compile
gcc sponge.c -o sponge
#I've prepared a test file, called test.txt
cat test.txt
#xyz

#this doesn't work
cat test.txt | sed 's/xyz/abc/' > test.txt
#empty
cat test.txt

#using sponge
cat test.txt
#xyz
#read test.txt, substitute, and write back to same file
cat test.txt | sed 's/xyz/abc/' | sponge test.txt
#voila
cat test.txt
abc

Start vim without loading .vimrc

vim -u NONE

Quickly switch between two directories by using cd -:

cd /etc
#switch back to the previous directory
cd -

This should be common knowledge but if you work with a lot of tables, use cut to cut out columns:

#echo out some 2 x 3 table
echo -e "chr1\t1\t2\nchr1\t2\t3"
chr1    1       2
chr1    2       3

#the default delimiter is a tab
#cut out the first two columns
echo -e "chr1\t1\t2\nchr1\t2\t3" | cut -f1,2
chr1    1
chr1    2

Use sort and uniq -c to create a tally:

#echo out a list of numbers
echo -e "11\n3\n1\n2\n2\n4\n6\n1\n1"
11
3
1
2
2
4
6
1
1

#create a tally of the numbers
#there are three 1's, two 2's, etc.
echo -e "11\n3\n1\n2\n2\n4\n6\n1\n1" | sort | uniq -c | sort -k1rn
      3 1
      2 2
      1 11
      1 3
      1 4
      1 6

#you can combine cut, sort and uniq -c
#to create quick summaries of columns

Sorting chromosomes alpha-numerically by using sort -k1,1V:

echo -e "chrY\nchr10\nchr2\nchr1\nchrM" | sort -k1,1V
chr1
chr2
chr10
chrM
chrY

Sorting by scientific notation:

# not sorted
echo -e "10e-10\n10e-13\n10e-7"
10e-10
10e-13
10e-7

# not sorted properly
echo -e "10e-10\n10e-13\n10e-7" | sort -n
10e-10
10e-13
10e-7

# sorted from smallest to largest
echo -e "10e-10\n10e-13\n10e-7" | sort -g
10e-13
10e-10
10e-7

# sorted from largest to smallest
echo -e "10e-10\n10e-13\n10e-7" | sort -gr
10e-7
10e-10
10e-13

For getting quick statistics at the command line, use the filo package:

for i in {1..10}; do echo $i; done | stats
Total lines:            10
Sum of lines:           55
Ari. Mean:              5.5
Geo. Mean:              4.52872868811677
Median:                 5.5
Mode:                   1 (N=1)
Anti-Mode:              1 (N=1)
Minimum:                1
Maximum:                10
Variance:               8.25
StdDev:                 2.87228132326901

If you use Perl, use perl -le to run a one-liner on the command line. The -e enables Perl code to be executed on the command line. The -l adds a newline to everything you print.

#print out 1 to 100
perl -le 'for(1..100){ print $_ }'

If you want to pipe to Perl, use -n; this saves you the trouble of having to type while(<>){ ... }. See perldoc perlrun for more information.

echo hi | perl -nle 's/hi/bye/; print'
#bye

#I use this a lot to print out line numbers
#learn about Perl's special variables at
#http://www.kichwa.com/quik_ref/spec_variables.html
echo -e 'a\nb\nc\nd\ne\nf' | perl -nle 'print "line $.: $_"'
line 1: a
line 2: b
line 3: c
line 4: d
line 5: e
line 6: f

Run R -e to use R from the command line:

#find the number of combinations without replacement
R -e 'choose(100,2)'

#as suggested in the comments
#to keep R quiet, i.e., turn off the welcome message
R --quiet -e 'choose(100,2)'

#which can be used with vanilla mode
#The command-line option --vanilla implies 
#--no-site-file, --no-init-file, --no-environ and (except for R CMD) --no-restore
R --vanilla --quiet -e 'choose(100,2)'

The convert program from ImageMagick is absolutely amazing for manipulating images at the command line.

If you need to rearrange columns of a file, use awk:

echo -e '3\t4\t1\t2' | awk 'OFS="\t" {print $3, $4, $1, $2}'
1       2       3       4

If you want to print out the middle of a file, use sed. In the past, I used a combination of head and tail:

#I want lines 3 to 7
#using head and tail
for i in {1..10}; do echo $i; done | head -7 | tail -5
3
4
5
6
7

#using sed
for i in {1..10}; do echo $i; done | sed -n '3,7p'
3
4
5
6
7

Copy files from one directory to another but don't copy files that already exist using rsync. This is really useful for example when copying Bioconductor packages from an older R installation (e.g. R-3.0.2) to a newer R installation (e.g. R-3.0.3).

# where org/ is the existing directory
# and dup is where you want to copy the files
rsync --ignore-existing -r -v org/ dup

# copy Bioconductor packages from an older R installation
rsync --ignore-existing -r -v ~/src/R-3.0.2/library/ ~/src/R-3.0.3/library

# then open up R and run the below to update the packages
# update.packages(checkBuilt = TRUE, ask = FALSE)
# source("https://bioconductor.org/biocLite.R")
# biocLite()

Submit data to a HTML form with POST method and save the response:

#http://www.commandlinefu.com/commands/view/2681/submit-data-to-a-html-form-with-post-method-and-save-the-response
curl -sd 'rid=value&submit=SUBMIT' <URL> > out.html

Need to work out which day you were born? Use cal:

cal -y 1983
#I was born on a Thursday!

#probably this is more useful
#prints out a monthly calendar for the current year
cal -y

Check out http://www.explainshell.com/ to have shell commands and parameters explained.

Use GNU parallel to speed up your work.

See Stephen Turner's useful bash one-liners for bioinformatics.

See Heng Li's UNIX Toolbox.

I better get back to writing! I'll update this list periodically.




Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
4 comments Add yours
  1. If I may add one, I’d suggest
    awk -F $’\t’ ‘{print NF}’ “$@” | uniq -c;
    to quickly show the dimensions of a tab-separated table.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.