OBO
The OBO flat file is described here -> http://www.geneontology.org/GO.format.obo-1_2.shtml
Comments start with an exclamation mark
<tag>: <value> {<trailing modifiers>} ! <comment>
The tag name is always a string. The value is always a string, but the value string may require special parsing depending on the tag with which it is associated.
At present, every OBO stanza always begins with an id tag.
is_a: This tag describes a subclassing relationship between one term and another. The value is the id of the term of which this term is a subclass. A term may have any number of is_a relationships.
intersection_of: This tag indicates that this term is equivalent to the intersection of several other terms. The value is either a term id, or a relationship type id, a space, and a term id.
relationship: This tag describes a typed relationship between this term and another term. The value of this tag should be the relationship type id, and then the id of the target term. The relationship type name must be a relationship type name as defined in a typedef tag stanza. The [Typedef] must either occur in a document in the current parse batch, or in a file imported via an import header tag. If the relationship type name is undefined, a parse error will be generated. If the id of the target term cannot be resolved by the end of parsing the current batch of files, this tag describes a "dangling reference"; see the parser requirements section for information about how a parser may handle dangling references. If a relationship is specified for a term with an is_obsolete value of true, a parse error will be generated.
OBO-Edit
Download here -> http://oboedit.org/
Download graphviz -> http://www.graphviz.org/
Once installed there are examples inside the folder test_resources. Let's take a look at one of the example files called structured-car.obo:
format-version: 1.2 date: 10:05:2011 11:25 saved-by: nomi auto-generated-by: OBO-Edit 2.1-beta13 default-namespace: file:/Users/nomi/Documents/workspace/OBO-Edit/test_resources/car.obo [Term] id: TEST:0000001 name: car synonym: "automobile" EXACT [] [Term] id: TEST:0000002 name: blue [Term] id: TEST:0000003 name: blue car is_a: TEST:0000001 ! car relationship: has_color TEST:0000002 ! blue [Term] id: TEST:0000004 name: blue VW is_a: TEST:0000005 ! VW relationship: has_color TEST:0000002 ! blue [Term] id: TEST:0000005 name: VW is_a: TEST:0000001 ! car [Term] id: TEST:0000006 name: automobile is_obsolete: true [Typedef] id: has_color name: has_color [Typedef] id: has_make name: has_make
On the highest level are the terms TEST:0000001 (car) and TEST:0000002 (blue). TEST:0000003 (blue car) is_a TEST:0000001 (car) and has the relationship has_color TEST:0000002 (blue). So the parent is TEST:0000001 and the child TEST:0000003.
TEST:0000005 (VW) is_a TEST:0000001 (car). TEST:0000004 (blue VW) is_a TEST:0000005 (VW). There are two levels here, TEST:0000004 -> TEST:0000005 -> TEST:0000001.
Perl parser
#!/bin/env perl use strict; use warnings; use Getopt::Std; my %opts = (); getopts('f:h', \%opts); if ($opts{'h'} || !keys %opts){ usage(); } sub usage { print STDERR <<EOF; Program: parse_obo.pl (parses an OBO file) Version: 0.0.1 Usage: $0 -f infile where -f the name of the obo file -h this helpful usage message EOF exit; } my @terms = (); my @obsos = (); my @synonyms = (); my @parents = (); my %id2idx = (); my %children = (); my @maps = qw(TERM OBSOLETE PARENTS CHILDREN ANCESTOR OFFSPRING); my %mapcounts = (); open(IN,'<',$opts{'f'}) || die "Could not open $opts{f}: $!\n"; while (<IN>) { chomp; if (/^\[Term\]/) { my $id = ; my $term = ; my $obso = 0; my @altids = (); my @syns = (); my @rels = (); while (<IN>) { chomp; last if (/^$/); if (/^id: (\S+)/) { $id = $1; } elsif (/^name: (.*)$/) { $term = $1; } elsif (/^is_obsolete:\s+true/) { $obso = 1; } elsif (/^alt_id: (.*)$/) { push(@altids, $1); } elsif (/^synonym: (.*)$/) { my $line = $1; my ($syn) = $line =~ /^\"(.*)\" \S+ \^\*\]$/; push(@syns, $1); if ($1 eq ) { print STDERR "Unexpected line: $line\n"; } } elsif (/^is_a: (\S+)/) { push(@rels, [$1, 'is_a']); } elsif (/^relationship: (\S+) (\S+)/) { push(@rels, [$2, $1]); } } if ($term eq ) { print STDERR "name is blank for $id\n"; next; } if (!$obso) { push(@terms, [$id, $term]); foreach my $altid (@altids) { push(@synonyms, [$id, $altid, $altid, 1]); } foreach my $syn (@syns) { push(@synonyms, [$id, $syn, "", 0]); } foreach my $rel (@rels) { push(@parents, [$id, @$rel]); $children{$rel->[0]}->{$id}++; } } else { push(@obsos, [$id, $term]); } } } close(IN); use Data::Dumper; print Dumper(@parents);