Updated: 2014 March 14th
From the Wikipedia article:
Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) is a technique that incorporates chromatin immunoprecipitation (ChIP)-based enrichment, chromatin proximity ligation, Paired-End Tags, and High-throughput sequencing to determine de novo long-range chromatin interactions genome-wide.
Let's get started on using the ENCODE ChIA-PET dataset by downloading the bed files, which has the interactions:
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeGisChiaPet/wgEncodeGisChiaPetK562Pol2InteractionsRep1.bed.gz wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeGisChiaPet/wgEncodeGisChiaPetK562Pol2InteractionsRep2.bed.gz #get others if you want wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeGisChiaPet/wgEncodeGisChiaPetHct116Pol2InteractionsRep1.bed.gz wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeGisChiaPet/wgEncodeGisChiaPetHelas3Pol2InteractionsRep1.bed.gz wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeGisChiaPet/wgEncodeGisChiaPetK562CtcfInteractionsRep1.bed.gz wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeGisChiaPet/wgEncodeGisChiaPetMcf7CtcfInteractionsRep1.bed.gz wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeGisChiaPet/wgEncodeGisChiaPetMcf7CtcfInteractionsRep2.bed.gz wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeGisChiaPet/wgEncodeGisChiaPetMcf7EraaInteractionsRep1.bed.gz wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeGisChiaPet/wgEncodeGisChiaPetMcf7EraaInteractionsRep2.bed.gz wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeGisChiaPet/wgEncodeGisChiaPetMcf7EraaInteractionsRep3.bed.gz wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeGisChiaPet/wgEncodeGisChiaPetMcf7Pol2InteractionsRep1.bed.gz wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeGisChiaPet/wgEncodeGisChiaPetMcf7Pol2InteractionsRep2.bed.gz wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeGisChiaPet/wgEncodeGisChiaPetMcf7Pol2InteractionsRep3.bed.gz wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeGisChiaPet/wgEncodeGisChiaPetMcf7Pol2InteractionsRep4.bed.gz wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeGisChiaPet/wgEncodeGisChiaPetNb4Pol2InteractionsRep1.bed.gz
For a preview and description of these interaction bed files, have a look at the table schema, which has the definition:
ChIA-PET Chromatin Interaction PET clusters: Two different genomic regions in the chromatin are genomically far from each other or in different chromosomes, but are spatially close to each other in the nucleus and interact with each other for regulatory functions. BED12 format is used to represent the data.
One cool way of visualising these chromtain interactions would be by using Circos (instructions for installing circos). Here's a simple Perl script to parse the bed12 files and prepare them in a format readable by Circos:
#!/bin/env perl use strict; use warnings; my $usage = "Usage: $0 <infile.bed>\n"; my $infile = shift or die $usage; my %check = (); open(IN,'<', $infile) || die "Could not open $infile: $!\n"; while(<IN>){ chomp; #chr1 875400 877590 chr1:875400..877590-chr8:126185955..126188744,2 200 . 875400 877590 255,0,0 1 2190 0 my($chr, $start, $end, $name, @rest) = split(); if ($name =~ /^(chr[0-9xym]+):(\d+)\.\.(\d+)-(chr[0-9xym]+):(\d+)\.\.(\d+),\d+/i){ #inter-chromosomal interactions are on two lines #skip the duplicated line if (exists $check{$name}){ next; } else { $check{$name} = 1; } my $chr_first = $1; my $start_first = $2; my $end_first = $3; my $chr_second = $4; my $start_second = $5; my $end_second = $6; $chr_second =~ s/chr/hs/; $chr_first =~ s/chr/hs/; $chr_first = lc($chr_first); $chr_second = lc($chr_second); print join (" ", $chr_first, $start_first, $end_first, $chr_second, $start_second, $end_second),"\n"; } else { die "Could not parse $name\n"; } } close(IN); exit(0);
Now to execute the script:
./to_circos.pl wgEncodeGisChiaPetK562Pol2InteractionsRep1.bed > wgEncodeGisChiaPetK562Pol2InteractionsRep1.link
I have these three files prepared for Circos (ideogram.conf, ticks.conf and test.conf). See my getting started with Circos post to get more information on these configuration files.
#ideogram.conf cat ideogram.conf <ideogram> <spacing> default = 0.005r </spacing> # Ideogram position, fill and outline radius = 0.90r thickness = 20p fill = yes stroke_color = dgrey stroke_thickness = 2p # Minimum definition for ideogram labels. show_label = yes # see etc/fonts.conf for list of font names label_font = default label_radius = dims(image,radius) - 60p label_size = 30 label_parallel = yes </ideogram> #ticks.conf cat ticks.conf show_ticks = yes show_tick_labels = yes <ticks> radius = 1r color = black thickness = 2p # the tick label is derived by multiplying the tick position # by 'multiplier' and casting it in 'format': # # sprintf(format,position*multiplier) # multiplier = 1e-6 # %d - integer # %f - float # %.1f - float with one decimal # %.2f - float with two decimals # # for other formats, see http://perldoc.perl.org/functions/sprintf.html format = %d <tick> spacing = 5u size = 10p </tick> <tick> spacing = 25u size = 15p show_label = yes label_size = 20p label_offset = 10p format = %d </tick> </ticks> #test.conf cat test.conf | grep -v "^#" | grep -v "^$" karyotype = data/karyotype/karyotype.human.txt chromosomes_units = 1000000 <links> <link> file = wgEncodeGisChiaPetK562Pol2InteractionsRep1.link radius = 0.8r bezier_radius = 0r color = black_a4 thickness = 2 <rules> <rule> condition = var(intrachr) show = no </rule> <rule> condition = 1 color = eval(var(chr2)) flow = continue </rule> <rule> condition = to(hs1) radius2 = 0.99r </rule> </rules> </link> </links> <<include ideogram.conf>> <<include ticks.conf>> <image> <<include etc/image.conf>> </image> <<include etc/colors_fonts_patterns.conf>> <<include etc/housekeeping.conf>> #now run Circos assuming you have the link file in the same directory as the conf files bin/circos -conf test.conf
If everything worked perfectly, you should get this image:
There's a whole new level of complexity when we take the spatial organisation of chromosomes into account as well.
Intra-chromosomal
If we want to focus on chromosome one and show long range interactions (over 1 mb):
#how many intra-chromosomal interactions on chromosome 1 cat wgEncodeGisChiaPetK562Pol2InteractionsRep1.link | awk '$1=="hs1" && $4=="hs1" && $5-$2>1000000 {print}' | wc -l 164 cat wgEncodeGisChiaPetK562Pol2InteractionsRep1.link | awk '$1=="hs1" && $4=="hs1" && $5-$2>1000000 {print}' > chr1_to_chr1.link
The ticks.conf and ideogram.conf are the same. Here's what the test.conf file looks like:
cat test.conf karyotype = data/karyotype/karyotype.human.txt chromosomes_units = 1000000 chromosomes_display_default = no chromosomes = hs1 <links> <link> file = chr1_to_chr1.link radius = 0.8r bezier_radius = 0r color = black_a4 thickness = 2 <rules> <rule> condition = var(intrachr) show = yes </rule> <rule> condition = 1 color = eval(var(chr2)) flow = continue </rule> <rule> condition = to(hs1) radius2 = 0.99r </rule> </rules> </link> </links> <<include ideogram.conf>> <<include ticks.conf>> <image> <<include etc/image.conf>> </image> <<include etc/colors_fonts_patterns.conf>> <<include etc/housekeeping.conf>> #run circos circos -conf test.conf
Long range intra-chromosomal interactions on chromosome one.
Conclusions
I've showed a way of visualisation the ChIA-PET dataset but not on using the dataset. One way I intend to use this dataset is to modify the Perl script above to produce a bed file containing the genomic loci that interact with another loci. Then just to get an idea of what these regions encompass, I would intersect them with some genome annotation file.

This work is licensed under a Creative Commons
Attribution 4.0 International License.
Hi nice poste :),
Just I think there is a small problem in your script,
in the bed file, if you notice the interactions are repeated if it is an inter-chromosome interaction, but it will be written in one line if it is itra-chromosomal interaction.
So you’ll have a lot of duplicated interactions,
you can just add a test, if they are in the same chromosome, go to next line, otherwise skip the next line.
Hi Nadhi,
You’re right; I was a bit sloppy there.
I adjusted the code to print out only one line for inter-chromosomal interactions.
Thanks for letting me know 🙂
Cheers,
Dave
Hi Dave,
I solved the problem…thanks for your help.
However, I would like to know as to how to change the colour to read or any other bright colour in the script for detecting inter-chromosomal interactions.
regards,
Amit.
Hi Dave,
Thank you for the post, I am facing a problem which is as follows;
I used the first script to convert .bed file to .link file and after that I created the three files ideogram.conf(line 1 to 24), ticks.conf(line 25 to 67) and test.conf(line 69 to 103). After this I ran the command circos -conf test.conf but I got an error which was like this;
ebuggroup summary 0.13s welcome to circos v0.64 2 May 2013
debuggroup summary 0.13s loading configuration from file test.conf
debuggroup summary 0.13s found conf file test.conf
*** CIRCOS ERROR ***
CONFIGURATION FILE ERROR
Error parsing the configuration file. You used an <> directive,
but the FILE could not be found. This FILE is interpreted relative to the
configuration file in which the <> directive is used. Circos lookd
for the file in these directories
/etc/circos
.
./etc
/usr/bin/etc
/usr/bin/../etc
/usr/bin/..
/usr/bin
The Config::General module reported the error
Config::General The file “etc/image.conf” does not exist within ConfigPath:
/etc/circos…./etc./usr/bin/etc./usr/bin/../etc./usr/bin/…/usr/bin! at
/usr/share/perl5/Circos/Configuration.pm line 707.
If you are having trouble debugging this error, use this tutorial to learn how
to use the debugging facility
http://www.circos.ca/tutorials/lessons/configuration/debugging
If you’re still stumped, get support in the Circos Google Group
http://groups.google.com/group/circos-data-visualization
Stack trace:
at /usr/share/perl5/Circos/Error.pm line 354.
Circos::Error::fatal_error(‘configuration’, ‘cannot_find_include’, ‘/etc/circos\x{a}.\x{a}./etc\x{a}/usr/bin/etc\x{a}/usr/bin/../etc\x{a}/usr/bin/..\x{a}…’, ‘Config::General The file “etc/image.conf” does not exist with…’) called at /usr/share/perl5/Circos/Configuration.pm line 719
Circos::Configuration::loadconfiguration(‘test.conf’) called at /usr/share/perl5/Circos.pm line 197
Circos::run(‘Circos’, ‘configfile’, ‘test.conf’) called at /usr/bin/circos line 300
Please can you tell me where I went wrong. I could run circos using the file in example folder.
Hi Akash,
it seems that Circos could not find some file.
Error parsing the configuration file. You used an <> directive,
but the FILE could not be found. This FILE is interpreted relative to the
configuration file in which the <> directive is used.
I’m not sure but try looking for unmatched <>‘s in your configuration file.
Hi Akash,
Don’t know whether you have solved the problem or not. I met the same problem the first time I tried. I think you also misunderstood the codes provided above. The contents of “ideogram.conf” actually should be line 3-24, “ticks.conf” line 28-67 and “test.conf” line 70-106. That “cat ×××” should be a command run by Dave. lol
Trying to run circos but getting this error, Can you help me with this Dave?
circos -conf test.conf (after running this command)
*** CIRCOS ERROR ***
CONFIGURATION FILE ERROR
…error text from [error/configuration.missing.txt] could not be read…
I’m getting the same error and can’t solve it.