Calculating Pearson correlation using Perl

My modification of code which is originally available here. Probably easier to understand the original code. I altered the code so that I could use an anonymous 2d array and with strictures, so that I could plug it into my own code.

Comments are included in the code below to assist use.

#!/usr/bin/perl

use strict;
use warnings;

#make up some matrix for demonstration purposes
my $x = [];
#generate 10 row
for (my $i=1;$i<11;++$i){
   #generate 2 columns, where column 1 = $i and column 2 = $i
   for (my $j=1;$j<3;++$j){
      $x->[$i][$j]= $i;
   }
}
#end matrix code

my $correlation = correlation($x);
#Since the values in the columns are identical in $x, the correlation will be 1
print "$correlation\n";

#to use this code, remove the matrix code above
#generate an anonymous 2D array where $x->[1] is the row
#$x->[1][1] is the value in row 1 column 1 and $x->[1][2] is the value of row 1 column 2
#once you build the entire array, pass it to the correlation subroutine as above
#my $corrleation = correlation($x)

#if you want to see what's inside $x use the code below
#for (my $i = 1; $i < scalar(@{$x}); ++$i){
#   my $line_to_print = '';
#   for (my $j = 1; $j < scalar(@{$x->[$i]}); ++$j){
#      $line_to_print .= "$x->[$i]->[$j]\t";
#   }
#   $line_to_print =~ s/\t$//;
#   print "$line_to_print\n";
#}

sub mean {
   my ($x)=@_;
   my $num = scalar(@{$x}) - 1;
   my $sum_x = '0';
   my $sum_y = '0';
   for (my $i = 1; $i < scalar(@{$x}); ++$i){
      $sum_x += $x->[$i][1];
      $sum_y += $x->[$i][2];
   }
   my $mu_x = $sum_x / $num;
   my $mu_y = $sum_y / $num;
   return($mu_x,$mu_y);
}

### ss = sum of squared deviations to the mean
sub ss {
   my ($x,$mean_x,$mean_y,$one,$two)=@_;
   my $sum = '0';
   for (my $i=1;$i<scalar(@{$x});++$i){
     $sum += ($x->[$i][$one]-$mean_x)*($x->[$i][$two]-$mean_y);
   }
   return $sum;
}

sub correlation {
   my ($x) = @_;
   my ($mean_x,$mean_y) = mean($x);
   my $ssxx=ss($x,$mean_x,$mean_y,1,1);
   my $ssyy=ss($x,$mean_x,$mean_y,2,2);
   my $ssxy=ss($x,$mean_x,$mean_y,1,2);
   my $correl=correl($ssxx,$ssyy,$ssxy);
   my $xcorrel=sprintf("%.4f",$correl);
   return($xcorrel);

}

sub correl {
   my($ssxx,$ssyy,$ssxy)=@_;
   my $sign=$ssxy/abs($ssxy);
   my $correl=$sign*sqrt($ssxy*$ssxy/($ssxx*$ssyy));
   return $correl;
}

Genome scan for 6mer frequency

Split the genome into 6 bp windows and calculate the 6 mer frequencies.

#!/usr/bin/perl

use strict;
use warnings;

my $usage = "Usage $0 <infile.fa>\n";
my $infile = shift or die $usage;

my $seq = '';
open(IN,'<',$infile) || die "Could not open $infile: $!\n";
while(<IN>){
   chomp;
   next if (/^>/);
   $seq .= $_;
}
close(IN);

my %six_mer = ();
my $length = length($seq);
foreach my $frame (0 .. 5){
   my $frame_length = $length - $frame;
   #print "Frame length: $frame_length\n";
   my $mod = $frame_length % 6;
   #print "Mod: $mod\n";
   $frame_length -= $mod;
   #print "Frame length: $frame_length\n";
   for (my $i = $frame; $frame<$frame_length; $frame=$frame+6){
      my $mer = substr($seq,$frame,6);
      #print "$mer\n";
      if (exists $six_mer{$mer}){
         $six_mer{$mer}++;
      } else {
         $six_mer{$mer} = '1';
      }
   }
}

print "$infile\n";
foreach my $mer (sort {$six_mer{$b} <=> $six_mer{$a} } keys %six_mer){
   print "$mer: $six_mer{$mer}\n";
}
print "\n";

exit(0);

Scanning chr6 of hg19:

NNNNNN: 3719950
aaaaaa: 373380
tttttt: 372667
TTTTTT: 184768
AAAAAA: 182652
aaaaat: 143055
attttt: 142671
ATTTTT: 133646
TTTAAA: 133284
AAAAAT: 133130
TATTTT: 130672
AAAATA: 129572
TTTTAA: 129570
TTAAAA: 129177
aaaata: 123528
tatttt: 123134
atatat: 119872
...

Hierarchical clustering with p-values

The code, which allowed me to use the Spearman's rank correlation coefficient, was kindly provided to me by the developer of pvclust.

Firstly download the unofficial package or just source it from my DropBox account. Start up R and follow:

#load the package
#if you are having trouble sourcing from https run
#install.packages('devtools')
#library(devtools)
#source_url("https://dl.dropboxusercontent.com/u/15251811/pvclust/pvclust.R")
#source_url("https://dl.dropboxusercontent.com/u/15251811/pvclust/pvclust-internal.R")

source("https://dl.dropboxusercontent.com/u/15251811/pvclust/pvclust.R")
source("https://dl.dropboxusercontent.com/u/15251811/pvclust/pvclust-internal.R")

#use a test dataset from DESeq
#install DESeq if necessary
source("http://bioconductor.org/biocLite.R")
biocLite("DESeq")
#load DESeq
library("DESeq")
example_file <- system.file ("extra/TagSeqExample.tab", package="DESeq")
data <- read.delim(example_file, header=T, row.names="gene")
head(data)
           T1a T1b  T2  T3  N1  N2
Gene_00001   0   0   2   0   0   1
Gene_00002  20   8  12   5  19  26
Gene_00003   3   0   2   0   0   0
Gene_00004  75  84 241 149 271 257
Gene_00005  10  16   4   0   4  10
Gene_00006 129 126 451 223 243 149

# Define a distance function

spearman <- function(x, ...) {
    x <- as.matrix(x)
    res <- as.dist(1 - cor(x, method = "spearman", use = "everything"))
    res <- as.dist(res)
    attr(res, "method") <- "spearman"
    return(res)
}

result <- pvclust(data, method.dist=spearman, nboot=100)

result

Cluster method: average
Distance      : spearman

Estimates on edges:

  au bp se.au se.bp v c pchi
1  1  1     0     0 0 0    0
2  1  1     0     0 0 0    0
3  1  1     0     0 0 0    0
4  1  1     0     0 0 0    0
5  1  1     0     0 0 0    0

#pvclust classed object
class(result)
[1] "pvclust"

names(result)
[1] "hclust" "edges"  "count"  "msfit"  "nboot"  "r"      "store"

plot(result)
pvrect(result, alpha=0.95)

pvclust_exampleHierarchical clustering with p-values.

Pooling technical replicates in edgeR

This post is very old and should just be ignored. But if you came across this, here's a thread on the Bioconductor mailing list that may be relevant

4 libraries each with technical replicates and 2 conditions. Technical replicates are the same samples performed identically. First treat technical replicates separately:

library(edgeR)
targets <- read.delim(file = "targets.txt", stringsAsFactors = F)
d <- readDGE(targets, comment.char="#")
d <- estimateCommonDisp(d)

d$samples$lib.size
#[1] 258680 733173 334908 687753 321877 320562 582971 539963 570570 484404
#[11] 408124 467288

d$common.lib.size
#[1] 453057

>colSums(d$pseudo.alt)
#1        2        3        4        5        6        7        8
#451962.0 453242.9 452553.6 454109.2 451432.0 451713.3 455116.5 453668.7
#9       10       11       12
#453444.9 453177.2 453137.0 453183.9

d$common.dispersion
#[1] 0.4648173

sqrt(d$common.dispersion)
#[1] 0.6817751

de.com <- exactTest(d)

options(digits=4)

topTags(de.com)

detags.com <- rownames(topTags(de.com)$table)

sum(de.com$table$p.value < 0.01)
#[1] 1981

top1981 < topTags(de.com, n=1981)

sum(top1981$table$logFC > 0)
#[1] 1123

sum(top1981$table$logFC &lt; 0)
#[1] 858

sum(p.adjust(de.com$table$p.value,method="BH") < 0.05)
#[1] 335

Now pooling everything together, so that we are comparing 2 x 2 and following the same pipeline:

targets = read.delim(file = "targets2.txt", stringsAsFactors=F)

d <- estimateCommonDisp(d)
d$samples$lib.size
#[1] 1326914 1330453 1693521 1359966

d$common.lib.size
#[1] 1420006

colSums(d$pseudo.alt)
#1       2       3       4
#1420267 1419859 1420363 1420132

d$common.dispersion
#[1] 0.2876606

sqrt(d$common.dispersion)

de.com <- exactTest(d)

options(digits=4)

topTags(de.com)

detags.com = rownames(topTags(de.com)$table)

sum(de.com$table$p.value < 0.01)
#[1] 1024

top1024 = topTags(de.com, n=1024)

sum(top1024$table$logFC > 0)
#[1] 795

&gt;sum(top1024$table$logFC < 0)
#[1] 229

sum(p.adjust(de.com$table$p.value,method="BH") < 0.05)
#[1] 89

None of the 89 from the adjusted set in the pooled analysis overlap the 335 from the unpooled analysis.

Using blat to map short RNAs

Updated on 2013 November 5th to include mapping of piRNAs

I still use blat as my multi-purpose alignment tool despite it being developed over 10 years ago. For those needing a simple introduction to blat, see my using blat post. Now I was wondering if blat could aligned short RNAs; the definition of short RNAs is generally <200nt. So here I try aligned miRNAs, which are in the size range of 20-24nt and piRNAs, which are in the size range of 26-31nt.

Mapping human miRNAs using blat

Firstly let's download a set of mature miRNA sequences from miRBase:

#release 20
wget ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.gz
zcat mature.fa.gz | grep ">" | wc
  30424  152120 1678688
#how many human miRNAs?
zcat mature.fa.gz | grep Homo | wc
   2578   12890  135904
#save human miRNAs
zcat mature.fa.gz | grep -A1 Homo > mature_human.fa
cat mature_human.fa | tr U T > temp
mv temp mature_human.fa

Now let's blat using the default settings to the assembled hg19 chromosomes:

#usage:
#   blat database query [-ooc=11.ooc] output.psl
blat ~/genome/hg19.fa mature_human.fa mature_human.psl
#Loaded 3095693983 letters in 25 sequences
#Searched 55887 bases in 2578 sequences
#no results

The default settings don't map any of the miRNAs. Let's use a stepSize of 1, and no thresholds on the score and identity:

#using GNU parallel to speed things up: http://www.gnu.org/software/parallel/
parallel blat -stepSize=1 -minScore=0 -minIdentity=0 ~/genome/hg19_chrom/chr{}.fa mature_human.fa mature_human_{}.psl ::: {1..22} X Y M
#how many miRNAs were mapped?
cat *.psl | awk '{print $10}' | grep hsa | sort -u | wc
#   2578    2578   38305

So we could map all the miRNAs using the maximum number of overlapping tiles and no score threshold. I've written a script that converts the psl output into a bed file. In addition the script only outputs the best scoring matches:

Let's use the script to output the best scoring matches in bed format:

cat *.psl > mature_human.psl
psl_to_bed_best_score.pl mature_human.psl mature_human_best_score.bed
#Stored 2578 entries
cat mature_human_best_score.bed | wc
  17884  214608 1392785

#how many uniquely mapped miRNAs?
cat mature_human_best_score.bed | cut -f4 | sort | uniq -c | awk '$1==1 {print}' | wc
   2147    4294   49215

#how many multimapping positions for the most highly multimapped miRNA?
cat mature_human_best_score.bed | cut -f4 | sort | uniq -c | sort -k1rn | head
    768 hsa-miR-1273f
    768 hsa-miR-5096
    768 hsa-miR-619-5p
    757 hsa-miR-1273g-3p
    722 hsa-miR-1273h-5p
    612 hsa-miR-5095
    591 hsa-miR-5585-3p
    584 hsa-miR-548ap-3p
    510 hsa-miR-548ar-5p
    507 hsa-miR-548ay-5p

Let's compare our uniquely mapped miRNAs with the genomic coordinates provided by miRBase:

wget ftp://mirbase.org/pub/mirbase/CURRENT/genomes/hsa.gff3
##gff-version 3
##date 2013-10-1
#
# Chromosomal coordinates of Homo sapiens microRNAs
# microRNAs:               miRBase v20
# genome-build-id:         GRCh37.p5
# genome-build-accession:  NCBI_Assembly:GCA_000001405.6
#
# Hairpin precursor sequences have type "miRNA_primary_transcript".
# Note, these sequences do not represent the full primary transcript,
# rather a predicted stem-loop portion that includes the precursor
# miRNA. Mature sequences have type "miRNA".
#
chr1    .       miRNA_primary_transcript        17369   17436   .       -    .  ID=MI0022705;Alias=MI0022705;Name=hsa-mir-6859-1
chr1    .       miRNA   17409   17431   .       -       .       ID=MIMAT0027618;Alias=MIMAT0027618;Name=hsa-miR-6859-5p;Derives_from=MI0022705
chr1    .       miRNA   17369   17391   .       -       .       ID=MIMAT0027619;Alias=MIMAT0027619;Name=hsa-miR-6859-3p;Derives_from=MI0022705
chr1    .       miRNA_primary_transcript        30366   30503   .       +    .  ID=MI0006363;Alias=MI0006363;Name=hsa-mir-1302-2
chr1    .       miRNA   30438   30458   .       +       .       ID=MIMAT0005890;Alias=MIMAT0005890;Name=hsa-miR-1302;Derives_from=MI0006363
chr1    .       miRNA_primary_transcript        567705  567793  .       -    .  ID=MI0022558;Alias=MI0022558;Name=hsa-mir-6723
chr1    .       miRNA   567762  567783  .       -       .       ID=MIMAT0025855;Alias=MIMAT0025855;Name=hsa-miR-6723-5p;Derives_from=MI0022558

And by using this script:

#!/bin/env perl

use strict;
use warnings;

my $gff = 'hsa.gff3';
my $bed = 'mature_human_best_score.bed';

chomp(my $unique = `cat mature_human_best_score.bed | cut -f4 | sort | uniq -c | awk '\$1==1 {print}'`);
#      1 hsa-miR-98-5p

my %unique = ();
my @line = split(/\n/,$unique);
foreach my $entry (@line){
   if ($entry =~ /(hsa.*)$/){
      my $mirna = $1;
      $unique{$mirna} = 1;
   } else {
      die "Could not parse $entry\n";
   }
}

my %bed = ();
open(BED,'<',$bed) || die "Could not open $bed: $!\n";
while(<BED>){
   chomp;
   my ($chr, $start, $end, $name, @etc) = split(/\t/);
   next unless exists $unique{$name};
   $start += 1;
   $bed{$name}{'CHR'} = $chr;
   $bed{$name}{'START'} = $start;
   $bed{$name}{'END'} = $end;
}
close(BED);

open(GFF,'<',$gff) || die "Could not open $gff: $!\n";
while(<GFF>){
   chomp;
   next if /^#/;
   #chr1    .       miRNA   17409   17431   .       -       .       ID=MIMAT0027618;Alias=MIMAT0027618;Name=hsa-miR-6859-5p;Derives_from=MI0022705
   my ($chr, $source, $type, $start, $end, $score, $strand, $phase, $id) = split(/\t/);
   my $mirna = '';

   my @id_split = split(/;/,$id);
   foreach my $split (@id_split){
      if ($split =~ /Name=(.*)$/){
         $mirna = $1;
      }
   }
   next unless exists $bed{$mirna};

   if ($chr eq $bed{$mirna}{'CHR'} && $start == $bed{$mirna}{'START'} && $end == $bed{$mirna}{'END'}){
      print "Same\n";
   } else {
      print "Different: $mirna mapped: $bed{$mirna}{'CHR'}:$bed{$mirna}{'START'}-$bed{$mirna}{'END'} gff: $chr:$start-$end\n";
   }
}
close(GFF);

exit(0);

And here are the differently mapped miRNAs:

compare.pl | grep Diff
Different: hsa-miR-8070 mapped: chr11:11804631-11804652 gff: chr11:11804738-11804759
Different: hsa-miR-4487 mapped: chr11:47422535-47422553 gff: chr11:47422574-47422592
Different: hsa-miR-4441 mapped: chr2:240007394-240007410 gff: chr2:240007524-240007540
Different: hsa-miR-297 mapped: chr8:49860914-49860934 gff: chr4:111781780-111781800
Different: hsa-miR-3142 mapped: chr5:159901415-159901436 gff: chr5:159901457-159901478
Different: hsa-miR-450a-5p mapped: chrX:133674423-133674444 gff: chrX:133674594-133674615

So of the 2,147 uniquely mapped miRNAs using blat, 6 of them mapped differently to the coordinates provided by miRBase.

Mapping human piRNAs using blat

I will use human piRNAs generated from this post. Let's map these piRNAs in the same manner:

#using GNU parallel to speed things up: http://www.gnu.org/software/parallel/
parallel blat -stepSize=1 -minScore=0 -minIdentity=0 ~/genome/hg19_chrom/chr{}.fa human_pirna.fa human_pirna_{}.psl ::: {1..22} X Y M
#how many piRNAs were mapped?
cat human_pirna*.psl > human_pirna.psl
cat human_pirna.psl | awk '{print $10}' | grep -P "n\d+" | sort -u | wc
#  32152   32152  289368

cat human_pirna.fa | grep ">" | wc
# 32152  675404 3543949

psl_to_bed_best_score.pl human_pirna.psl human_pirna_best_score.bed

#how many uniquely mapped piRNAs?
cat human_pirna_best_score.bed | cut -f4 | sort | uniq -c | awk '$1==1 {print}' | wc
#  25960   51920  415360

#how many were mapped in total?
cat human_pirna_best_score.bed | cut -f4 | sort -u | wc
#  32152   32152  257216

#percentage of uniquely mapped?
bc -l<<<25960/32152*100
#80.74147797959691465500

Conclusions

Using a stepSize of 1 and no thresholds on the scoring allowed the mapping of miRNAs and piRNAs using blat. The majority of uniquely mapped miRNAs from miRBase 20 were mapped to the location assigned by miRBase except for 6 exceptions. So it would seem that one could use blat for mapping short RNAs.

Of the RefSeq’s that have CpG islands 1,000 bp upstream, what are the GO terms - part 2?

Using GO.db and GOstats, I obtained the gene list with bona fide CpG islands upstream and conducted a GO enrichment analysis. The choice of the gene universe is again all RefSeq gene models. Enriched Biological Processes include:

1 primary metabolic process
2 branched chain family amino acid metabolic process
3 regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolic process
4 regulation of transcription, DNA-dependent
5 nitrogen compound metabolic process
6 cellular metabolic process
7 mitotic cell cycle checkpoint
8 nucleobase, nucleoside and nucleotide metabolic process
9 cAMP biosynthetic process
10 G-protein signaling, coupled to cAMP nucleotide second messenger
11 chromatin organization
12 cellular biopolymer metabolic process

Molecular functions:

1 sequence-specific DNA binding
2 transcription factor activity

Cellular components:

Term
1 nucleus
2 membrane-bounded organelle
3 chromosomal part
4 cell
5 intracellular part

Transcription factors more likely to have CpG islands 1,000 bp upstream, and not overlapping the 5' UTR?

Gene Ontology enrichment analysis

Updated: 2013 November 23rd

There are many tools available for conducting a gene ontology enrichment analysis. Some online tools that I've heard of are DAVID, PANTHER and GOrilla but I have never actually used them. Within Bioconductor there's GOstats, topGO and goseq (and probably some more). Here I'll test out GOstats and topGO.

Firstly download and install the packages if you haven't already:

source("http://bioconductor.org/biocLite.R")
#A set of annotation maps describing the entire Gene Ontology
biocLite("GO.db")
biocLite("topGO")
biocLite("GOstats")

Now let's create a toy example where my test set of genes are those that are associated with the GO term GO:0007411 (axon guidance). Let's fetch all these genes:

#install if you haven't already
source("http://bioconductor.org/biocLite.R")
biocLite("org.Hs.eg.db")
biocLite("GO.db")
library(org.Hs.eg.db)
library(GO.db)

#for reference's sake
#how to get GO terms from an Entrez gene id following the manual
#http://bioconductor.org/packages/release/data/annotation/manuals/org.Hs.eg.db/man/org.Hs.eg.db.pdf

#org.Hs.egGO is an R object that provides
#mappings between entrez gene identifers and the GO
#identifers that they are directly associated with

entrez_object <- org.Hs.egGO
# Get the entrez gene identifiers that are mapped to a GO ID
mapped_genes <- mappedkeys(entrez_object)
# Convert to a list
entrez_to_go <- as.list(entrez_object[mapped_genes])
#http://www.ncbi.nlm.nih.gov/gene/?term=1
#entrez gene id 1 is A1BG
entrez_to_go[[1]]
$`GO:0008150`
$`GO:0008150`$GOID
[1] "GO:0008150"

$`GO:0008150`$Evidence
[1] "ND"

$`GO:0008150`$Ontology
[1] "BP"


$`GO:0005576`
$`GO:0005576`$GOID
[1] "GO:0005576"

$`GO:0005576`$Evidence
[1] "IDA"

$`GO:0005576`$Ontology
[1] "CC"


$`GO:0003674`
$`GO:0003674`$GOID
[1] "GO:0003674"

$`GO:0003674`$Evidence
[1] "ND"

$`GO:0003674`$Ontology
[1] "MF"

#map GO terms to Entrez gene ids

go_object <- as.list(org.Hs.egGO2EG)
#test
go_object[1]
$`GO:0000002`
    TAS     TAS     ISS     IMP     NAS     IMP     IMP
  "291"  "1890"  "4205"  "4358"  "9361" "10000" "92667"
#same
go_object['GO:0000002']
$`GO:0000002`
    TAS     TAS     ISS     IMP     NAS     IMP     IMP
  "291"  "1890"  "4205"  "4358"  "9361" "10000" "92667"

#GO:0000002 -> mitochondrial genome maintenance
#entrez gene id 291 -> SLC25A4
#which if you look up has the GO term mitochondrial genome maintenance

#now let's get all the entrez gene ids for
#GO:0007411 (axon guidance)

axon_gene <- go_object['GO:0007411']
length(unlist(axon_gene, use.names=F))
[1] 330

length(unique(unlist(axon_gene, use.names=F)))
[1] 319
axon_gene <- unique(unlist(axon_gene, use.names=F))
head(axon_gene)
[1] "25"  "27"  "60"  "71"  "160" "161"

Now let's perform the enrichment analysis:

library("GO.db")
library("GOstats")

#as the universal list, I will use all the genes with GO terms

universe <- mapped_genes
length(axon_gene)
[1] 319
length(universe)
[1] 18105

params <- new('GOHyperGParams',
              geneIds=axon_gene,
              universeGeneIds=universe,
              ontology='BP',
              pvalueCutoff=0.001,
              conditional=F,
              testDirection='over',
              annotation="org.Hs.eg.db"
             )
hgOver <- hyperGTest(params)

hgOver
Gene to GO BP  test for over-representation 
4207 GO BP ids tested (997 have p < 0.001)
Selected gene set size: 319 
    Gene universe size: 14844 
    Annotation package: org.Hs.eg

result <- summary(hgOver)
head(result,20)
       GOBPID Pvalue OddsRatio  ExpCount Count Size
1  GO:0000902      0       Inf 21.060361   319  980
2  GO:0000904      0       Inf 15.215036   319  708
3  GO:0006928      0       Inf 30.945837   319 1440
4  GO:0006935      0       Inf 13.173471   319  613
5  GO:0007409      0       Inf 10.938494   319  509
6  GO:0007411      0       Inf  7.757949   319  361
7  GO:0009605      0       Inf 29.226624   319 1360
8  GO:0022008      0       Inf 25.723727   319 1197
9  GO:0030030      0       Inf 21.060361   319  980
10 GO:0030182      0       Inf 22.414241   319 1043
11 GO:0031175      0       Inf 15.601859   319  726
12 GO:0032989      0       Inf 22.435732   319 1044
13 GO:0032990      0       Inf 15.386958   319  716
14 GO:0040011      0       Inf 28.001684   319 1303
15 GO:0042330      0       Inf 13.173471   319  613
16 GO:0048468      0       Inf 33.847009   319 1575
17 GO:0048666      0       Inf 18.030248   319  839
18 GO:0048667      0       Inf 12.163433   319  566
19 GO:0048699      0       Inf 24.240905   319 1128
20 GO:0048812      0       Inf 12.378335   319  576
                                                    Term
1                                     cell morphogenesis
2         cell morphogenesis involved in differentiation
3                            cellular component movement
4                                             chemotaxis
5                                           axonogenesis
6                                          axon guidance
7                          response to external stimulus
8                                           neurogenesis
9                           cell projection organization
10                                neuron differentiation
11                         neuron projection development
12                      cellular component morphogenesis
13                               cell part morphogenesis
14                                            locomotion
15                                                 taxis
16                                      cell development
17                                    neuron development
18 cell morphogenesis involved in neuron differentiation
19                                 generation of neurons
20                       neuron projection morphogenesis

So as expected we get a bunch of GO terms associated to neurons and of our original GO term, axon guidance.

What if my gene list are not Entrez gene ids?

Use biomaRt. For this example I will show how you can go from Ensembl gene IDs to Entrez gene IDs:

#install if necessary
source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt")
 
library("biomaRt")
ensembl <- useMart("ensembl",dataset="hsapiens_gene_ensembl")

my_chr <- c(1:22, 'M', 'X', 'Y')
my_ensembl_gene <- getBM(attributes='ensembl_gene_id',
                    filters = 'chromosome_name',
                    values = my_chr,
                    mart = ensembl)

#how many entries
length(my_ensembl_gene)
[1] 52322

five_ensembl <- my_ensembl_gene[1:5,]

five_ensembl_to_entrez <- getBM(attributes=c('ensembl_gene_id', 'entrezgene'), filters = 'ensembl_gene_id', values = five_ensembl, mart = ensembl)
five_ensembl_to_entrez
  ensembl_gene_id entrezgene
1 ENSG00000036549      26009
2 ENSG00000037637      54455
3 ENSG00000221126         NA
4 ENSG00000225011         NA
5 ENSG00000228176         NA

#out of interest how many entrez gene ids?
my_entrez_gene <- getBM(attributes='entrezgene',
                    filters = 'chromosome_name',
                    values = my_chr,
                    mart = ensembl)

length(my_entrez_gene[,1])
[1] 24231

#get some more info on the entrez_gene
my_attribute <- c('entrezgene',
                  'hgnc_symbol',
                  'chromosome_name',
                  'start_position',
                  'end_position',
                  'strand')

my_entrez_gene_info  <- getBM(attributes=my_attribute,
                        filters = c('entrezgene', 'chromosome_name'),
                        values = list(entrezgene=my_entrez_gene$entrezgene, chromosome_name=my_chr),
                        mart = ensembl)

head(my_entrez_gene_info)
  entrezgene hgnc_symbol chromosome_name start_position end_position strand
1     266919   LINC00307              21       31581469     31584101     -1
2       1652         DDT              22       24313554     24322660     -1
3  100422891     MIR4327              21       31747612     31747696     -1
4     115761       ARL11              13       50202435     50208008      1
5      64863      METTL4              18        2537524      2571508     -1
6       3704        ITPA              20        3189514      3204516      1
dim(my_entrez_gene_info)
[1] 26232     6

#entrez gene id on different chromosome locations
my_entrez_gene_info[my_entrez_gene_info$entrezgene=='728369',]
    entrezgene hgnc_symbol chromosome_name start_position end_position strand
306     728369    USP17L24               4        9326891      9328483      1
316     728369    USP17L25               4        9331637      9333229      1
327     728369    USP17L26               4        9336384      9337976      1
337     728369    USP17L27               4        9345874      9347466      1
345     728369    USP17L28               4        9350619      9352211      1
353     728369    USP17L29               4        9355364      9356956      1
363     728369    USP17L30               4        9364855      9366447      1

To learn more about the missing Entrez ID values from the Ensembl conversion see this useful post on BioStars and my post on converting gene identifiers.

Further reading

org.Hs.eg.db reference manual

GOstats vignette.