Preview

Vavilov Journal of Genetics and Breeding

Advanced search

Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites

https://doi.org/10.18699/VJ21.002

Abstract

The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS) is the positional weight matrix (PWM). However, this model does not take into account dependencies between nucleotide occurrences in different site positions. Currently, two recently proposed models, BaMM and InMoDe, can do as much. However, application of these models was usually limited only to comparing their recognition accuracies with that of PWMs, while none of the analyses of the co-prediction and relative positioning of hits of different models in peaks has yet been performed. To close this gap, we propose the pipeline called MultiDeNA. This pipeline includes stages of model training, assessing their recognition accuracy, scanning ChIP-seq peaks and their classif ication based on scan results. We applied our pipeline to 22 ChIP-seq datasets of TF FOXA2 and considered PWM, dinucleotide PWM (diPWM), BaMM and InMoDe models. The combination of these four models allowed a signif icant increase in the fraction of recognized peaks compared to that for the sole PWM model: the increase was 26.3 %. The BaMM model provided the main contribution to the recognition of sites. Although the major fraction of predicted peaks contained TFBS of different models with coincided positions, the medians of the fraction of peaks containing the predictions of sole models were 1.08, 0.49, 4.15 and 1.73 % for PWM, diPWM, BaMM and InMoDe, respectively. Thus, FOXA2 BSs were not fully described by only a sole model, which indicates theirs heterogeneity. We assume that the BaMM model is the most successful in describing the structure of the FOXA2 BS in ChIP-seq datasets under study.

About the Authors

A. V. Tsukanov
Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences
Russian Federation
Novosibirsk


V. G. Levitsky
Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences; Novosibirsk State University
Russian Federation
Novosibirsk


T. I. Merkulova
Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences; Novosibirsk State University
Russian Federation
Novosibirsk


References

1. Bailey T.L., Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proc. Int. Conf. Intell. Syst. Mol. Biol. 1994;2:28-36. DOI citeulike-article-id:878292. PMID 7584402.

2. Benos P.V., Bulyk M.L., Stormo G.D. Additivity in protein-DNA interactions: how good an approximation is it? Nucleic Acids Res. 2002;30(20):4442-4451. DOI 10.1093/nar/gkf578.

3. Bi Y., Kim H., Gupta R., Davuluri R.V. Tree-based position weight matrix approach to model transcription factor binding site profiles. PLoS One. 2011;6(9):e24210. DOI 10.1371/journal.pone.0024210.

4. Bulyk M.L., Johnson P.L.F., Church G.M. Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Res. 2002;30(5):1255-1261. DOI 10.1093/nar/30.5.1255.

5. Chen X., Wei H., Li J., Liang X., Dai S., Jiang L., Guo M., Qu L., Chen Z., Chen L., Chen Y. Structural basis for DNA recognition by FOXC2. Nucleic Acids Res. 2019;47(7):3752-3764. DOI 10.1093/nar/gkz077.

6. Chèneby J., Ménétrier Z., Mestdagh M., Rosnet T., Douida A., Rhalloussi W., Bergon A., Lopez F., Ballester B. ReMap 2020: a database of regulatory regions from an integrative analysis of Human and Arabidopsis DNA-binding sequencing experiments. Nucleic Acids Res. 2020;48(D1):D180-D188. DOI 10.1093/nar/gkz945.

7. Eggeling R., Grosse I., Grau J. InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites. Bioinformatics. 2017;33(4):580-582. DOI 10.1093/bioinformatics/btw689.

8. Farnham P.J. Insights from genomic profiling of transcription factors. Nat. Rev. Genet. 2009;10(9):605-616. DOI 10.1038/nrg2636.

9. Furey T.S. ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nat. Rev. Genet. 2012;13(12):840-852. DOI 10.1038/nrg3306.

10. Gheorghe M., Sandve G.K., Khan A., Chèneby J., Ballester B., Mathelier A. A map of direct TF-DNA interactions in the human genome. Nucleic Acids Res. 2019;47(4):e21. DOI 10.1093/nar/gky1210.

11. Gupta S., Stamatoyannopoulos J.A., Bailey T.L., Noble W.S. Quantifying similarity between motifs. Genome Biol. 2007;8(2):R24. DOI 10.1186/gb-2007-8-2-r24.

12. Heinz S., Benner C., Spann N., Bertolino E., Lin Y.C., Laslo P., Cheng J.X., Murre C., Singh H., Glass C.K. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell. 2010;38(4):576-589. DOI 10.1016/j.molcel.2010.05.004.

13. Ignatieva E.V., Oshchepkov D.Y., Levitsky V.G., Vasiliev G.V., Klimova N.V., Busygina T.V., Merkulova T.I. Comparison of the results of search for the SF-1 binding sites in the promoter regions of the steroidogenic genes, using the SiteGA and SITECON methods. In: Proc. Fourth Int. Conf. Bioinform. Genome Regul. Struct. (BGRS). 2004;1:69-72.

14. Iwafuchi-Doi M. The mechanistic basis for chromatin regulation by pioneer transcription factors. WIREs Syst. Biol. Med. 2019;11(1): e1427. DOI 10.1002/wsbm.1427.

15. Keilwagen J., Grau J. Varying levels of complexity in transcription factor binding motifs. Nucleic Acids Res. 2015;43(18):e119. DOI 10.1093/nar/gkv577.

16. Kiesel A., Roth C., Ge W., Wess M., Meier M., Söding J. The BaMM web server for de-novo motif discovery and regulatory sequence analysis. Nucleic Acids Res. 2018;46(W1):W215-W220. DOI 10.1093/nar/gky431.

17. Kulakovskiy I.V., Boeva V.A., Favorov A.V., Makeev V.J. Deep and wide digging for binding motifs in ChIP-Seq data. Bioinformatics. 2010;26(20):2622-2623. DOI 10.1093/bioinformatics/btq488.

18. Kulakovskiy I., Levitsky V., Oshchepkov D., Bryzgalov L., Vorontsov I., Makeev V. From binding motifs in ChIP-Seq data to improved models of transcription factor binding sites. J. Bioinform. Comput. Biol. 2013;11(01):1340004. DOI 10.1142/S0219720013400040.

19. Kulakovskiy I.V., Makeev V.J. Discovery of DNA motifs recognized by transcription factors through integration of different experimental sources. Biophysics (Oxf.). 2009;54(6):667-674. DOI 10.1134/S0006350909060013.

20. Kulakovskiy I.V., Vorontsov I.E., Yevshin I.S., Sharipov R.N., Fedorova A.D., Rumynskiy E.I., Medvedeva Y.A., Magana-Mora A., Bajic V.B., Papatsenko D.A., Kolpakov F.A., Makeev V.J. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis. Nucleic Acids Res. 2018;46(D1):D252-D259. DOI 10.1093/nar/gkx1106.

21. Lambert S.A., Jolma A., Campitelli L.F., Das P.K., Yin Y., Albu M., Chen X., Taipale J., Hughes T.R., Weirauch M.T. The human transcription factors. Cell. 2018;172(4):650-665. DOI 10.1016/j.cell.2018.01.029.

22. Latchman D.S. Transcription factors: bound to activate or repress. Trends Biochem. Sci. 2001;26(4):211-213. DOI 10.1016/S0968-0004(01)01812-6.

23. Levitsky V.G., Ignatieva E.V., Ananko E.A., Turnaev I.I., Merkulova T.I., Kolchanov N.A., Hodgman T.C.T. Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions. BMC Bioinform. 2007;8(1):1-20. DOI 10.1186/1471-2105-8-481.

24. Levitsky V.G., Kulakovskiy I.V., Ershov N.I., Oshchepkov D.Y., Makeev V.J., Hodgman T.C., Merkulova T.I. Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data. BMC Genom. 2014;15(1):80. DOI 10.1186/1471-2164-15-80.

25. Levitsky V.G., Oshchepkov D.Y., Klimova N.V., Ignatieva E.V., Vasiliev G.V., Merkulov V.M., Merkulova T.I. Hidden heterogeneity of transcription factor binding sites: a case study of SF-1. Comput. Biol. Chem. 2016;64:19-32. DOI 10.1016/j.compbiolchem.2016.04.008.

26. Lloyd S.M., Bao X. Pinpointing the genomic localizations of chromatin-associated proteins: the yesterday, today, and tomorrow of ChIP-seq. Curr. Protoc. Cell Biol. 2019;84(1):e89. DOI 10.1002/cpcb.89.

27. Machanick P., Bailey T.L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics. 2011;27(12):1696-1697. DOI 10.1093/bioinformatics/btr189.

28. Mathelier A., Wasserman W.W. The next generation of transcription factor binding site prediction. PLoS Comput. Biol. 2013;9(9): e1003214. DOI 10.1371/journal.pcbi.1003214.

29. McClish D.K. Analyzing a portion of the ROC curve. Med. Decis. Mak. 1989;9(3):190-195. DOI 10.1177/0272989X8900900307.

30. Mitra S., Biswas A., Narlikar L. DIVERSITY in binding, regulation, and evolution revealed from high-throughput ChIP. PLoS Comput. Biol. 2018;14(4):1-20. DOI 10.1371/journal.pcbi.1006090.

31. Morgunova E., Taipale J. Structural perspective of cooperative transcription factor binding. Curr. Opin. Struct. Biol. 2017;47:1-8. DOI 10.1016/j.sbi.2017.03.006.

32. Morgunova E., Yin Y., Das P.K., Jolma A., Zhu F., Popov A., Xu Y., Nilsson L., Taipale J. Two distinct DNA sequences recognized by transcription factors represent enthalpy and entropy optima. eLife. 2018;7:1-21. DOI 10.7554/eLife.32963.

33. Park P.J. ChIP-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 2009;10(10):669-680. DOI 10.1038/nrg2641.

34. Quinlan A.R., Hall I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841-842. DOI 10.1093/bioinformatics/btq033.

35. Rogers J.M., Waters C.T., Seegar T.C.M., Jarrett S.M., Hallworth A.N., Blacklow S.C., Bulyk M.L. Bispecific forkhead transcription factor FoxN3 recognizes two distinct motifs with different DNA shapes. Mol. Cell. 2019;74(2):245-253.DOI 10.1016/j.molcel.2019.01.019.

36. Samee M.A.H., Bruneau B.G., Pollard K.S. A de novo shape motif discovery algorithm reveals preferences of transcription factors for DNA shape beyond sequence motifs. Cell Syst. 2019;8(1):27-42. DOI 10.1016/j.cels.2018.12.001.

37. Siebert M., Söding J. Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences. Nucleic Acids Res. 2016;44(13):6055-6069. DOI 10.1093/nar/gkw521.

38. Srivastava D., Mahony S. Sequence and chromatin determinants of transcription factor binding and the establishment of cell type-specific binding patterns. Biochim. Biophys. Acta – Gene Regul. Mech. 2020;1863(6):e194443. DOI 10.1016/j.bbagrm.2019.194443.

39. Stormo G.D. DNA binding sites: representation and discovery. Bioinformatics. 2000;16(1):16-23. DOI 10.1093/bioinformatics/16.1.16.

40. Wallerman O., Motallebipour M., Enroth S., Patra K., Bysani M.S.R., Komorowski J., Wadelius C. Molecular interactions between HNF4a, FOXA2 and GABP identified at regulatory DNA elements through ChIP-sequencing. Nucleic Acids Res. 2009;37(22):7498-7508. DOI 10.1093/nar/gkp823.

41. Wederell E.D., Bilenky M., Cullum R., Thiessen N., Dagpinar M., Delaney A., Varhol R., Zhao Y., Zeng T., Bernier B., Ingham M., Hirst M., Robertson G., Marra M.A., Jones S., Hoodless P.A. Global analysis of in vivo Foxa2-binding sites in mouse adult liver using massively parallel sequencing. Nucleic Acids Res. 2008;36(14): 4549-4564. DOI 10.1093/nar/gkn382.

42. Worsley Hunt R., Wasserman W.W. Non-targeted transcription factors motifs are a systemic component of ChIP-seq datasets. Genome Biol. 2014;15(7):412. DOI 10.1186/s13059-014-0412-4.

43. Yang L., Zhou T., Dror I., Mathelier A., Wasserman W.W., Gordân R., Rohs R. TFBSshape: a motif database for DNA shape features of transcription factor binding sites. Nucleic Acids Res. 2014;42(D1): D148-D155. DOI 10.1093/nar/gkt1087.

44. Zhang M.O., Marr T.G. A weight array method for splicing signal analysis. Bioinformatics. 1993;9(5):499-509. DOI 10.1093/bioinformatics/9.5.499.

45. Zhang Y., Liu T., Meyer C.A., Eeckhoute J., Johnson D.S., Bernstein B.E., Nusbaum C., Myers R.M., Brown M., Li W., Liu X.S. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008; 9(9):R137. DOI 10.1186/gb-2008-9-9-r137.


Review

Views: 823


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2500-3259 (Online)