Computer analysis of co-localization of transcription factor binding sites in genome by ChIP-seq data
https://doi.org/10.18699/VJ16.194
Abstract
Statistical features of the distribution of transcription factor binding sites in the mouse genome that are obtained by ChIP-seq experiments in embryonic stem cells have been considered. Clusters of sites that contain four or more different transcription factor binding sites in the mouse genome have been defined, also their location relatively to the regulatory regions of genes has been described. The presence of two types of site co-localization has been shown: clusters containing binding sites for factors Oct4, Nanog, Sox2, located in the distal regions, and clusters containing binding sites n-Myc, c-Myc, mainly located in the promoter regions of mouse genes. Analysis of new ChIPseq data about binding of transcription factors Nr5a2, Tbx3 in the same cell type has confirmed the division of clusters of transcription factors binding sites into two types: those containing the binding sites of regulators of pluripotency (Oct4, Nanog, and others) and those not. The computer program of the statistical data processing of gene location and chromatin domains that analyzes experimental data of site localization obtained by ChIP-seq in the mouse genome and the human genome has been developed. The presence of preferences at position of transcription factor binding sites of various types has been revealed, the distances between the nearest groups of TF binding sites Oct4, Nanog, Sox2 and TF binding sites n-Myc and c-Myc have been calculated using this program. The presence of nucleotide motifs of transcription factor binding sites in the selected areas of ChIP-seq has been estimated, nucleotide motifs have been refined. A correlation between the presence of motifs and the intensity of ChIPseq binding has been shown. Computer methods for estimating the clustering of different transcription factors binding sites for new data ChIP-seq have been developed. Programs are available upon the request to the authors.
About the Authors
A. I. DergilevRussian Federation
Novosibirsk, Russia
A. M. Spitsina
Russian Federation
Novosibirsk, Russia
I. V. Chadaeva
Russian Federation
Novosibirsk, Russia
A. V. Svichkarev
Russian Federation
Novosibirsk, Russia
Saint-Petersburg, Russia
F. M. Naumenko
Russian Federation
Novosibirsk, Russia
E. V. Kulakova
Russian Federation
Novosibirsk, Russia
E. R. Galieva
Russian Federation
Novosibirsk, Russia
E. E. Vityaev
Russian Federation
Novosibirsk, Russia
M. Chen
Russian Federation
Hangzhou, China
Y. L. Orlov
Russian Federation
Novosibirsk, Russia
References
1. Babenko V.N., Kosarev P.S., Vishnevsky O.V., Levitsky V.G., Basin V.V., Frolov A.S. Investigating extended regulatory regions of genomic DNA sequences. Bioinformatics. 1999;15(7-8):644-653. DOI 10.1093/bioinformatics/15.7.644.
2. Babenko V.N., Matvienko V.F., Safronova N.S. 19 Implication of transposons distribution on chromatin state and genome architecture in human. J. Biomol. Struct. Dyn. 2015;33(1):10- 11. DOI 10.1080/07391102.2015.1032559.
3. Bieda M., Xu X., Singer M.A., Green R., Farnham P. Unbiased location analysis of E2F1- binding sites suggests a widespread role for E2F1 in the human genome. Genome Res. 2006;16(5):595-605. DOI 10.1101/gr.4887606.
4. Boeva V. Analysis of genomic sequence motifs for deciphering transcription factor binding and transcriptional regulation in eukaryotic cells. Front. Genet. 2016;7:24. DOI 10.3389/fgene.2016.00024.
5. Boyer L.A., Lee T.I., Cole M.F., Johnstone S.E., Levine S.S., Zucker J.P., Guenther M.G., Kumar R.M., Murray H.L., Jenner R.G., Gifford D.K., Melton D.A., Jaenisch R., Young R.A. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell. 2005; 122(6):947-956. DOI 10.1016/j.cell.2005.08.020.
6. Chen X., Xu H., Yuan P., Fang F., Huss M., Vega V.B., Wong E., Orlov Y.L., Zhang W., Jiang J., Loh Y.H., Yeo H.C., Yeo Z.X., Narang V., Govindarajan K.R., Leong B., Shahab A., Ruan Y., Bourque G., Sung W.K., Clarke N.D., Wei C.L., Ng H.H. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell. 2008;133(6):1106-1117. DOI 10.1016/j.cell.2008.04.043.
7. Goh W.S., Orlov Y., Li J., Clarke N.D. Blurring of high-resolution data shows that the effect of intrinsic nucleosome occupancy on transcription factor binding is mostly regional, not local. PLoS Comput. Biol. 2010;6(1):e1000649. DOI 10.1371/journal.pcbi.1000649.
8. Golosova O., Henderson R., Vas’kin Yu., Gabrielian A., Grekhov G., Nagarajan V., Oler A.J., Quiñones M., Hurt D., Fursov M., Huyen Y. Unipro UGENE NGS pipelines and components for variant calling, RNA-seq and ChIP-seq data analyses. Peer J. 2014;2:e644. DOI 10.7717/peerj.644.
9. Guo Y., Mahony S., Gifford D.K. High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints. PLoS Comput. Biol. 2012;8(8):e1002638. DOI 10.1371/journal.pcbi.1002638.
10. Han J., Yuan P., Yang H., Zhang J., Soh B.S., Li P., Lim S.L., Cao S., Tay J., Orlov Y.L., Lufkin T., Ng H.H., Tam W.L., Lim B. Tbx3 improves the germ-line competency of induced pluripotent stem cells. Nature. 2010;463(7284):1096-1100.
11. He X., Cicek A.E., Wang Y., Schulz M.H., Le H.-S., Ziv B.-J. De novo ChIP-seq analysis. Genome Biol. 2015;16(1):205. DOI 10.1186/s13059-015-0756-4.
12. Heinemeyer T., Wingender E., Reuter I., Hermjakob H., Kel A.E., Kel O.V., Ignatieva E.V., Ananko E.A., Podkolodnaya O.A., Kolpakov F.A., Podkolodny N.L., Kolchanov N.A. Databases on transcriptional regulation: TRANSFAC, TRRD and COMPEL. Nucleic Acids Res. 1998;26(1):362-367. DOI 10.1093/nar/26.1.362.
13. Heng J.C., Feng B., Han J., Jiang J., Kraus P., Ng J.H., Orlov Y.L., Huss M., Yang L., Lufkin T., Lim B., Ng H.H. The nuclear receptor Nr5a2 can replace Oct4 in the reprogramming of murine somatic cells to pluripotent cells. Cell Stem Cell. 2010;6(2):167-174. DOI 10.1016/j.stem.2009.12.009.
14. Hutter B., Bieg M., Helms V., Paulsen M. Imprinted genes show unique patterns of sequence conservation. BMC Genomics. 2010;11:649. DOI 10.1186/1471-2164-11-649.
15. Ignatieva E.V., Podkolodnaya O.A., Orlov Yu.L., Vasiliev G.V., Kolchanov N.A. Regulatory genomics: Combined experimental and computational approaches. Rus. J. Genet. 2015;51(4):334-352. DOI 10.1134/S1022795415040067.
16. Ivanova N., Dobrin R., Lu R., Kotenko I., Levorse J., DeCoste C., Schafer X., Lun Yi., Lemischka I.R. Dissecting self-renewal in stem cells with RNA interference. Nature. 2006;442(7102):533-538. DOI 10.1038/nature04915.
17. Kuznetsov V.A., Orlov Yu.L., Wei C.L., Ruan Y. Computational analysis and modeling of genome-scale avidity distribution of transcription factor binding sites in chip-pet experiments. Genome Inform. 2007;19:83-94.
18. Kulakova E.V., Spitsina A.M., Orlova N.G., Dergilev A.I., Svichkarev A.V., Safronova N.S., Chernykh I.G., Orlov Yu.L. Programs for the analysis of genomic sequence data obtained by ChIP-seq, ChIA-PET, and Hi-C technologies. Programmnye Sistemy: Teoriya i Prilozheniya = Program Systems: Theory and Applications. 2015;6(2(25)):129-148. (in Russian)
19. Kuznetsov V.A., Singh O., Jenjaroenpun P. Statistics of protein-DNA binding and the total number of binding sites for a transcription factor in the mammalian genome. BMC Genomics. 2010;11(1):S12. DOI 10.1186/1471-2164-11-S1-S12.
20. Kuzniewska B., Nader K., Dabrowski M., Kaczmarek L., Kalita K. Adult deletion of SRF increases epileptogenesis and decreases activity-induced gene expression. Mol. Neurobiol. 2015;1-16. DOI 10.1007/s12035-014-9089-7.
21. Lee K.L., Lim S.K., Orlov Y.L., Yit le Y., Yang H., Ang L.T., Poellinger L., Lim B. Graded Nodal/Activin signaling titrates conversion of quantitative phospho-Smad2 levels into qualitative embryonic stem cell fate decisions. PLoS Genet. 2011;7(6):e1002130. DOI 10.1371/journal.pgen.1002130.
22. Li G., Cai L., Chang H., Hong P., Zhou Q., Kulakova E.V., Kolchanov N.A., Ruan Y. Chromatin interaction analysis with Paired-End Tag (ChIA-PET) sequencing technology and application. BMC Genomics. 2014;15(12):S11. DOI 10.1186/1471-2164-15-S12-S11.
23. Loh Y.H., Wu Q., Chew J.L., Vega V.B., Zhang W., Chen X., Bourque G., George J., Leong B., Liu J., Wong K.Y., Sung K.W., Lee C.W., Zhao X.D., Chiu K.P., Lipovich L., Kuznetsov V.A., Robson P., Stanton L.W., Wei C.L., Ruan Y., Lim B., Ng H.H. The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nat. Genet. 2006;38(4):431- 440. DOI 10.1038/ng1760.
24. Orlov Yu.L. Computer-assisted study of the regulation of eukaryotic gene transcription on the base of data on chromatin sequencing and precipitation. Vavilovskii Zhurnal Genetiki i Selektsii = Vavilov Journal of Genetics and Breeding. 2014;18(1):193-206. (in Russian)
25. Orlov Yu.L., Bragin A.O., Medvedeva I.V., Gunbin K.V., Demenkov P.S., Vishnevsky O.V., Levitsky V.G., Oshchepkov D.Y., Podkolodnyy N.L., Afonnikov D.A., Grosse I., Kolchanov N.A. ICGenomics: Software for analysis of symbol genomics sequences. Vavilovskii Zhurnal Genetiki i Selektsii = Vavilov Journal of Genetics and Breeding. 2012;16(4/1):732-741. (in Russian)
26. Orlov Yu.L., Huss M.E., Joseph R., Xu H., Vega V.B., Lee Y.K., Goh W.S., Thomsen J.S., Cheung E.C., Clarke N.D., Ng H.H. Genome-wide statistical analysis of multiple transcription factor binding sites obtained by ChIP-seq technologies. Proc. 1st ACM Workshop on Breaking Frontiers of Computational Biology (Comp- Bio ‘09). ACM, New York. N.Y., 2009;11-18.
27. Orlov Yu.L., Levitskii V.G., Smirnova O.G., Podkolodnaya O.A., Khlebodarova T.M., Kolchanov N.A. Statistical analysis of nucleosome formation sites. Biofizika = Biophysics. 2006;51(4):608-614. (in Russian)
28. Orlov Yu.L., Potapov V.N. Complexity: an internet resource for analysis of DNA sequence complexity. Nucleic Acids Res. 2004;32:W628-W633. DOI 10.1093/nar/gkh466.
29. Orlov Yu.L., Te Boekhorst R., Abnizova I.I. Statistical measures of the structure of genomic sequences: entropy, complexity, and position information. J. Bioinform. Comput. Biol. 2006;4:523-536. DOI 10.1142/S0219720006001801.
30. Orlov Yu., Xu H., Afonnikov D., Lim B., Heng J.C., Yuan P., Chen M., Yan J., Clarke N., Orlova N., Huss M., Gunbin K., Podkolodnyy N., Ng H.H. Computer and statistical analysis of transcription factor binding and chromatin modifications by ChIP-seq data in embryonic stem cell. J. Integr. Bioinform. 2012;9(2):211. DOI 10.2390/biecolljib-2012-211.
31. Panne D., Maniatis T., Harrison S.C. An atomic model of the interferonbeta enhanceosome. Cell. 2007;129(6):1111-1123. DOI 10.1016/j.cell.2007.05.019.
32. Polunin D.A., Shtaiger I.A., Efimov V.M. JACOBI 4 software for multivariate analysis of microarray data. Vestnik NGU. Ser. Informatsionnye tekhnologii = Novosibirsk State University Journal of Information Technologies. 2014;12(2):90-98. (in Russian)
33. Putta P., Orlov Yu.L., Podkolodnyy N.L., Mitra C.K. Relatively conserved common short sequences in transcription factor binding sites and miRNA. Vavilov Journal Genetics and Breeding. 2011;15(4): 750-756.
34. Safronova N.S., Babenko V.N., Orlov Yu.L. 117 Analysis of SNP containing sites in human genome using text complexity estimates. J. Biomol. Struct. Dyn. 2015;33(1):73-74. DOI 10.1080/07391102.2015.1032750.
35. Sirito M., Lin Q., Deng J.M., Behringer R.R., Sawadogo M. Overlapping roles and asymmetrical cross-regulation of the USF proteins in mice. Overlapping roles and asymmetrical cross-regulation of the USF proteins in mice. Proc. Natl. Acad. Sci. USA. 1998;95(7):3758- 3763.
36. Spitsina A.M., Orlov Yu.L., Podkolodnaya N.N., Svichkarev A.V., Dergilev A.I., Chen M., Kuchin N.V., Chernych I.G., Glinskiy B.M. Supercomputer analysis of genomics and transcriptomics data revealed by high-throughput DNA sequencing. Programmnye Sistemy: Teoriya i Prilozheniya = Program Systems: Theory and Applications. 2015;6(1(23)):157-174. (in Russian)
37. Takahashi K., Yamanaka S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell. 2006;126(4):663-676. DOI 10.1016/j.cell.2006.07.024.
38. Vas’kin Yu., Khomicheva I.V., Ignatieva E.V., Vityaev E.E. Expert discovery and UGENE integrated system for intelligent analysis of regulatory regions of genes. In Silico Biol. 2011- 2012;11(3-4):97-108. DOI 10.3233/ISB-2012-0448.
39. Vas’kin Yu.Yu., Khomicheva I.V., Ignatieva E.V., Vityaev E.E. Sequence analysis of regulatory regions of genes with the Expert Discovery relational system built in package UGENE. Vestnik NGU. Ser. Informatsionnye tekhnologii = Novosibirsk State University Journal of Information Technologies. 2012;10(1):73-86. (in Russian)
40. Vityaev E.E. Izvlechenie znaniy iz dannykh. Kompyuternoe poznanie. Modeli kognitivnykh protsessov [Data mining: Computerized cognition. Models of cognitive processes]. Novosibirsk: Novosibirsk State University Publ., 2006. (in Russian)
41. Vityaev E.E., Orlov Yu.L., Vishnevsky O.V., Belenok A.S., Kolchanov N.A. Computer system “Gene Discovery” to search for patterns in eukaryotic regulatory nucleotide sequences. Molekulyarnaya biologiya = Molecular Biology (Moscow). 2001;35(6):952-960. (in Russian)
42. Xu D., Wei G., Lu P., Luo J., Chen X., Skogerb G., Chen R. Analysis of the p53/CEP-1 regulated non-coding transcriptome in C. elegans by an NSR-seq strategy. Protein Cell. 2014;5(10):770-782. DOI 10.1007/s13238-014-0071-y.
43. Yanan Z., Quan X., Ya G., Qiang W. Characterization of a cluster of CTCF-binding sites in a protocadherin regulatory region. Yi Chuan. 2016;38(4):323-336. DOI 10.16288/j.yczz.16-037.
44. Zhang Y., Wang P. A fast cluster motif finding algorithm for ChIPSeq data sets. Biomed. Res. Int. 2015;2015;218068. DOI 10.1155/2015/218068.