Chromatin state

From Dave's wiki
Jump to navigation Jump to search

ENCODE chromatin states

http://www.ncbi.nlm.nih.gov/pubmed/21441907

Profiled nine human cell types consisting:

#Cell line information: see ​http://genome.ucsc.edu/ENCODE/cellTypes.html
H1ES - H1 human embryonic stem cells
K562 - an immortalized cell line produced from a female patient with chronic myelogenous leukemia (CML)
GM12878 - a lymphoblastoid cell line produced from the blood of a female donor with northern and western European ancestry by EBV transformation
HepG2 - a cell line derived from a male patient with liver carcinoma
HUVEC - human umbilical vein endothelial cells have a normal karyotype
HSMM - skeletal muscle myoblasts from the mesoderm lineage and muscle tissue with a normal karyotype
NHLF - lung fibroblasts from the endoderm lineage and lung tissue with a normal karyotype
NHEK - epidermal keratinocytes from the ectoderm lineage and skin with a normal karyotype
HMEC - mammary epithelial cells from the ectoderm lineage and breast tissue with a normal karyotype
fullpath=http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHmm/
wget ${fullpath}wgEncodeBroadHmmGm12878HMM.bed.gz
wget ${fullpath}wgEncodeBroadHmmH1hescHMM.bed.gz
wget ${fullpath}wgEncodeBroadHmmHepg2HMM.bed.gz
wget ${fullpath}wgEncodeBroadHmmHmecHMM.bed.gz
wget ${fullpath}wgEncodeBroadHmmHsmmHMM.bed.gz
wget ${fullpath}wgEncodeBroadHmmHuvecHMM.bed.gz
wget ${fullpath}wgEncodeBroadHmmK562HMM.bed.gz
wget ${fullpath}wgEncodeBroadHmmNhekHMM.bed.gz
wget ${fullpath}wgEncodeBroadHmmNhlfHMM.bed.gz
#the different states
zcat wg*.bed.gz | cut -f4 | sort -un
1_Active_Promoter
2_Weak_Promoter
3_Poised_Promoter
4_Strong_Enhancer
5_Strong_Enhancer
6_Weak_Enhancer
7_Weak_Enhancer
8_Insulator
9_Txn_Transition
10_Txn_Elongation
11_Weak_Txn
12_Repressed
13_Heterochrom/lo
14_Repetitive/CNV
15_Repetitive/CNV
#how many states in each bed file
for file in `ls *.gz`; do echo $file; zcat $file | wc -l; done
wgEncodeBroadHmmGm12878HMM.bed.gz
571339
wgEncodeBroadHmmH1hescHMM.bed.gz
619061
wgEncodeBroadHmmHepg2HMM.bed.gz
546343
wgEncodeBroadHmmHmecHMM.bed.gz
609251
wgEncodeBroadHmmHsmmHMM.bed.gz
638969
wgEncodeBroadHmmHuvecHMM.bed.gz
549915
wgEncodeBroadHmmK562HMM.bed.gz
622257
wgEncodeBroadHmmNhekHMM.bed.gz
628266
wgEncodeBroadHmmNhlfHMM.bed.gz
641016
#number of strong enhancers identified in all cell lines
zcat wg*.bed.gz | cut -f4 | grep Strong_Enhancer | wc
574810  574810 10346580
#number of weak enhancers identified in all cell lines
zcat wg*.bed.gz | cut -f4 | grep Weak_Enhancer | wc
1680951 1680951 26895216
#total number of enhancers identified in all cell lines
#sanity check
zcat wg*.bed.gz | cut -f4 | grep -i enhancer | wc
2255761 2255761 37241796
#store all the enhancer regions
zcat wg*.bed.gz | grep -i enhancer > enhancer.bed