MetArea: a software package for analysis of the mutually exclusive occurrence in pairs of motifs of transcription factor binding sites based on ChIP-seq data
https://doi.org/10.18699/vjgb-24-90
Abstract
ChIP-seq technology, which is based on chromatin immunoprecipitation (ChIP), allows mapping a set of genomic loci (peaks) containing binding sites (BS) for the investigated (target) transcription factor (TF). A TF may recognize several structurally different BS motifs. The multiprotein complex mapped in a ChIP-seq experiment includes target and other “partner” TFs linked by protein-protein interactions. Not all these TFs bind to DNA directly. Therefore, both target and partner TFs recognize enriched BS motifs in peaks. A de novo search approach is used to search for enriched TF BS motifs in ChIP-seq data. For a pair of enriched BS motifs of TFs, the co-occurrence or mutually exclusive occurrence can be detected from a set of peaks: the co-occurrence reflects a more frequent occurrence of two motifs in the same peaks, while the mutually exclusive means their more frequent detection in different peaks. We propose the MetArea software package to identify pairs of TF BS motifs with the mutually exclusive occurrence in ChIP-seq data. MetArea was designed to predict the structural diversity of BS motifs of the same TFs, and the functional relation of BS motifs of different TFs. The functional relation of the motifs of the two distinct TFs presumes that they are interchangeable as part of a multiprotein complex that uses the BS of these TFs to bind directly to DNA in different peaks. MetArea calculates the estimates of recognition performance pAUPRC (partial area under the Precision–Recall curve) for each of the two input single motifs, identifies the “joint” motif, and computes the performance for it too. The goal of the analysis is to find pairs of single motifs A and B for which the accuracy of the joint A&B motif is higher than those of both single motifs.
Keywords
About the Authors
V. G. LevitskyRussian Federation
Novosibirsk
A. V. Tsukanov
Russian Federation
Novosibirsk
T. I. Merkulova
Russian Federation
Novosibirsk
References
1. Ambrosini G., Vorontsov I., Penzar D., Groux R., Forne O., Nikolaeva D.D., Ballester B., Grau J., Grosse I., Makeev V., Kulakovskiy I., Buche P. Insights gained from a comprehensive all-against-all transcription factor binding motif benchmarking study. Genome Biol. 2020;21:114. doi 10.1186/s13059-020-01996-3
2. Amoutzias G.D., Robertson D.L., Van de Peer Y., Oliver S.G. Choose your partners: dimerization in eukaryotic transcription factors. Trends Biochem. Sci. 2008;33(5):220-229. doi 10.1016/j.tibs.2008.02.002
3. Bailey T.L. STREME: accurate and versatile sequence motif discovery. Bioinformatics. 2021;37:2834-2840. doi 10.1093/bioinformatics/btab203
4. Biswas A., Narlikar L. A universal framework for detecting cis-regulatory diversity in DNA regions. Genome Res. 2021;31(9):1646-1662. doi 10.1101/gr.274563.120
5. Chen Y., Chi P., Rockowitz S., Iaquinta P.J., Shamu T., Shukla S., Gao D., Sirota I., Carver B.S., Wongvipat J., Scher H.I., Zheng D., Sawyers C.L. ETS factors reprogram the androgen receptor cistrome and prime prostate tumorigenesis in response to PTEN loss. Nat. Med. 2013;19(8):1023-1029. doi 10.1038/nm.3216
6. Davis J., Goadrich M. The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning. New York: Assoc. for Computing Machinery, 2006;233-240. doi 10.1145/1143844.1143874
7. D’haeseleer P. What are DNA sequence motifs? Nat. Biotechnol. 2006; 24(4):423-425. doi 10.1038/nbt0406-423
8. Garber M., Yosef N., Goren A., Raychowdhury R., Thielke A., Guttman M., Robinson J., Minie B., Chevrier N., Itzhaki Z., Blecher-Gonen R., Bornstein C., Amann-Zalcenstein D., Weiner A., Friedrich D., Meldrim J., Ram O., Cheng C., Gnirke A., Fisher S., Friedman N., Wong B., Bernstein B.E., Nusbaum C., Hacohen N., Regev A., Amit I. A high-throughput chromatin immunoprecipitation approach reveals principles of dynamic gene regulation in mammals. Mol. Cell. 2012;47(5):810-822. doi 10.1016/j.molcel.2012.07.030
9. Georgakopoulos-Soares I., Deng C., Agarwal V., Chan C.S.Y., Zhao J., Inoue F., Ahituv N. Transcription factor binding site orientation and order are major drivers of gene regulatory activity. Nat. Commun. 2023;14:2333. doi 10.1038/s41467-023-37960-5
10. Gupta S., Stamatoyannopolous J.A., Bailey T.L., Noble W.S. Quantifying similarity between motifs. Genome Biol. 2007;8(2):R24. doi 10.1186/gb-2007-8-2-r24
11. Hess D.A., Strelau K.M., Karki A., Jiang M., Azevedo-Pouly A.C., Lee A.H., Deering T.G., Hoang C.Q., MacDonald R.J., Konieczny S.F. MIST1 links secretion and stress as both target and regulator of the unfolded protein response. Mol. Cell. Biol. 2016;36(23): 2931-2944. doi 10.1128/MCB.00366-16
12. Hu G., Dong X., Gong S., Song Y., Hutchins A.P., Yao H. Systematic screening of CTCF binding partners identifies that BHLHE40 regulates CTCF genome-wide distribution and long-range chromatin interactions. Nucleic Acids Res. 2020;48(17):9606-9620. doi 10.1093/nar/gkaa705
13. Johnson D.S., Mortazavi A., Myers R.M., Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007;316(5830): 1497-1502. doi 10.1126/science.1141319
14. Keilwagen J., Posch S., Grau J. Accurate prediction of cell type-specific transcription factor binding. Genome Biol. 2019;20(1):9. doi 10.1186/s13059-018-1614-y
15. Kel O.V., Romaschenko A.G., Kel A.E., Wingender E., Kolchanov N.A. A compilation of composite regulatory elements affecting gene transcription in vertebrates. Nucleic Acids Res. 1995;23(20):4097-4103. doi 10.1093/nar/23.20.4097
16. Kolmykov S., Yevshin I., Kulyashov M., Sharipov R., Kondrakhin Y., Makeev V.J., Kulakovskiy I.V., Kel A., Kolpakov F. GTRD: an integrated view of transcription regulation. Nucleic Acids Res. 2021; 49(D1):D104-D111. doi 10.1093/nar/gkaa1057
17. Lambert S.A., Jolma A., Campitelli L.F., Das P.K., Yin Y., Albu M., Chen X., Taipale J., Hughes T.R., Weirauch M.T. The human transcription factors. Cell. 2018;172(4):650-665. doi 10.1016/j.cell.2018.01.029
18. Levitsky V.G., Ignatieva E.V., Ananko E.A., Turnaev I.I., Merkulova T.I., Kolchanov N.A., Hodgman T.C. Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions. BMC Bioinformatics. 2007;8(1):481. doi 10.1186/1471-2105-8-481
19. Levitsky V., Zemlyanskaya E., Oshchepkov D., Podkolodnaya O., Ignatieva E., Grosse I., Mironova V., Merkulova T. A single ChIPseq dataset is sufficient for comprehensive analysis of motifs cooccurrence with MCOT package. Nucleic Acids Res. 2019;47:e139. doi 10.1093/nar/gkz800
20. Levitsky V., Oshchepkov D., Zemlyanskaya E., Merkulova T. Asymmetric conservation within pairs of co-occurred motifs mediates weak direct binding of transcription factors in ChIP-Seq data. Int. J. Mol. Sci. 2020;21(17):E6023. doi 10.3390/ijms21176023
21. Levitsky V.G., Tsukanov A.V. MetArea tool for predicting structural variability and cooperative binding of transcription factors in ChIPseq data. In: 14th International Conference on Bioinformatics of Genome Regulation and Structure/Systems Biology (BGRS/SB-2024). 2024;136-138. doi 10.18699/bgrs2024-1.2-17
22. Mitra S., Biswas A., Narlikar L. DIVERSITY in binding, regulation, and evolution revealed from high-throughput ChIP. PLoS Comput. Biol. 2018;14(4):e1006090. doi 10.1371/journal.pcbi.1006090
23. Morgunova E., Taipale J. Structural perspective of cooperative transcription factor binding. Curr. Opin. Struct. Biol. 2017;47:1-8. doi 10.1016/j.sbi.2017.03.006
24. Nagy G., Nagy L. Motif grammar: the basis of the language of gene expression. Comput. Struct. Biotechnol. J. 2020;18:2026-2032. doi 10.1016/j.csbj.2020.07.007
25. Raditsa V.V., Tsukanov A.V., Bogomolov A.G., Levitsky V.G. Genomic background sequences systematically outperform synthetic ones in de novo motif discovery for ChIP-seq data. NAR Genom. Bioinform. 2024;6(3):lqae090. doi 10.1093/nargab/lqae090
26. Rauluseviciute I., Riudavets-Puig R., Blanc-Mathieu R., Castro-Mondragon J.A., Ferenc K., Kumar V., Lemma R.B., Lucas J., Chèneby J., Baranasic D., Khan A., Fornes O., Gundersen S., Johansen M., Hovig E., Lenhard B., Sandelin A., Wasserman W.W., Parcy F., Mathelier A. JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2024;52(D1):D174-D182. doi 10.1093/nar/gkad1059
27. Rogers J.M., Waters C.T., Seegar T.C.M., Jarrett S.M., Hallworth A.N., Blacklow S.C., Bulyk M.L. Bispecific forkhead transcription factor FoxN3 recognizes two distinct motifs with different DNA shapes. Mol. Cell. 2019;74(2):245-253.e6. doi 10.1016/j.molcel.2019.01.019
28. Saito T., Rehmsmeier M. The Precision-Recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One. 2015;10(3):e0118432. doi 10.1371/journal.pone.0118432
29. Siebert M., Söding J. Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences. Nucleic Acids Res. 2016;44:6055-6069. doi 10.1093/nar/gkw521
30. Tognon M., Giugno R., Pinello L. A survey on algorithms to characterize transcription factor binding sites. Brief. Bioinform. 2023;24(3): bbad156. doi 10.1093/bib/bbad156
31. Tsukanov A.V., Levitsky V.G., Merkulova T.I. Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites. Vavilov J. Genet. Breed. 2021;25(1):7-17. doi 10.18699/VJ21.002
32. Tsukanov A.V., Mironova V.V., Levitsky V.G. Motif models proposing independent and interdependent impacts of nucleotides are related to high and low affinity transcription factor binding sites in Arabidopsis. Front. Plant Sci. 2022;13:938545. doi 10.3389/fpls.2022.938545
33. Vorontsov I.E., Eliseeva I.A., Zinkevich A., Nikonov M., Abramov S., Boytsov A., Kamenets V., Kasianova A., Kolmykov S., Yevshin I.S., Favorov A., Medvedeva Y.A., Jolma A., Kolpakov F., Makeev V.J., Kulakovskiy I.V. HOCOMOCO in 2024: a rebuild of the curated collection of binding models for human and mouse transcription factors. Nucleic Acids Res. 2024;52(D1):D154-D163. doi 10.1093/nar/gkad1077
34. Wasserman W.W., Sandelin A. Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 2004;5(4):276-287. doi 10.1038/nrg1315
35. Weirauch M.T., Yang A., Albu M., Cote A.G., Montenegro-Monter A., Drewe P., Najafabadi H.S., Lambert S.A., Mann I., Cook K., Zheng H., Goity A., van Bakel H., Lozano J.C., Galli M., Lew sey M.G., Huang E., Mukherjee T., Chen X., Reece-Hoyes J.S., Govindarajan S., Shaulsky G., Walhout A.J.M., Bouget F.Y., Ratsch G., Larrondo L.F., Ecker J.R., Hughes T.R. Determination and inference of eukaryotic transcription factor sequence specificity. Cell. 2014; 158(6):1431-1443. doi 10.1016/j.cell.2014.08.009
36. Wingender E. Criteria for an updated classification of human transcription factor DNA-binding domains. J. Bioinform. Comput. Biol. 2013;11(1):1340007. doi 10.1142/S0219720013400076
37. Wingender E., Schoeps T., Dönitz J. TFClass: an expandable hierarchical classification of human transcription factors. Nucleic Acids Res. 2013;41(D1):D165-D170. doi 10.1093/nar/gks1123
38. Wingender E., Schoeps T., Haubrock M., Dönitz J. TFClass: a classification of human transcription factors and their rodent orthologs. Nucleic Acids Res. 2015;43(D1):D97-D102. doi 10.1093/nar/gku1064
39. Wingender E., Schoeps T., Haubrock M., Krull M., Dönitz J. TFClass: expanding the classification of human transcription factors to their mammalian orthologs. Nucleic Acids Res. 2018;46(D1):D343-D347. doi 10.1093/nar/gkx987
40. Yang Y.A., Yu J. Current perspectives on FOXA1 regulation of androgen receptor signaling and prostate cancer. Genes Dis. 2015;2(2): 144-151. doi 10.1016/j.gendis.2015.01.003
41. Zeitlinger J. Seven myths of how transcription factors read the cisregulatory code. Curr. Opin. Syst. Biol. 2020;23:22-31. doi 10.1016/j.coisb.2020.08.002
42. Zhang Y., Liu T., Meyer C.A., Eeckhoute J., Johnson D.S., Bernstein B.E., Nussbaum C., Myers R.M., Brown M., Li W., Liu X.S. Model-based Analysis of ChIP-Seq (MACS). Genome Biol. 2008;9: R137. doi 10.1186/gb-2008-9-9-r137