Preview

Vavilov Journal of Genetics and Breeding

Advanced search

Improvement of a phylogenetic footprinting method for transcription factor binding sites recognition based on the use of bootstrap trials for the analysis of large bacterial genomic data

https://doi.org/10.18699/vjgb-26-10

Abstract

The rapid development of high-throughput sequencing technologies has led to an explosive accumulation of high-quality bacterial genome sequence data – their number is approaching three million, and this growth continues. This, in turn, provides additional impetus for the development of technologies for more efficient annotation using analytical methods designed to utilize such large-scale genomic data, as well as for achieving new levels of annotation quality. One such analytical approach is phylogenetic footprinting, which aims to identify motifs corresponding to transcription factor binding sites in the promoter regions of bacterial genomes by comparing corresponding sets of regulatory sequences of orthologous genes in related organisms. The continued accumulation of genomic data has served as the basis for further development of this approach. It has been found that an excessive number of sequences in a set analyzed using phylogenetic footprinting only reduces the accuracy of the method, whereas the inclusion of a sequence selection step in the analyzed set based on data on mutual evolutionary distances improves the method’s performance. In this paper, we propose and implement a further step in the development of the phylogenetic footprinting method. This step involves multiple runs of the selection step described above to generate distinct subsamples, subsequent pipeline runs for each subsample, and statistical analysis of the results obtained from multiple pipeline runs. The proposed approach, implemented in the MotifsOnFly method, improves the robustness of motif recognition results obtained from multiple pipeline runs. The effectiveness of the MotifsOnFly method is demonstrated using the analysis of the well-annotated promoter of the Escherichia coli OmpW gene.

About the Authors

A. M. Mukhin
Kurchatov Genomic Center of ICG SB RAS; Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences; Novosibirsk State University
Russian Federation

Novosibirsk



T. M. Khlebodarova
Kurchatov Genomic Center of ICG SB RAS; Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences
Russian Federation

Novosibirsk



D. Yu. Oshchepkov
Kurchatov Genomic Center of ICG SB RAS; Institute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences
Russian Federation

Novosibirsk



References

1. Bailey T.L., Boden M., Buske F.A., Frith M., Grant C.E., Clementi L., Ren J., Li W.W., Noble W.S. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009;37(W):W202-W208. doi 10.1093/nar/gkp335

2. Blanchette M., Tompa M. Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res. 2002; 12(5):739-748. doi 10.1101/gr.6902

3. Blanchette M., Tompa M. FootPrinter: a program designed for phylogenetic footprinting. Nucleic Acids Res. 2003;31(13):3840-3842. doi 10.1093/nar/gkg606

4. Browning D.F., Busby S.J. The regulation of bacterial transcription initiation. Nat Rev Microbiol. 2004;2(1):57-65. doi 10.1038/nrmicro787

5. Chen X., Guo L., Fan Z., Jiang T. W-AlignACE: an improved Gibbs sampling algorithm based on more accurate position weight matrices learned from sequence and gene expression/ChIP-chip data. Bioinformatics. 2008;24(9):1121-1128. doi 10.1093/bioinformatics/btn088

6. Claeys M., Storms V., Sun H., Michoel T., Marchal K. MotifSuite: workflow for probabilistic motif detection and assessment. Bioinformatics. 2012;28(14):1931-1932. doi 10.1093/bioinformatics/bts293

7. Constantinidou C., Hobman J.L., Griffiths L., Patel M.D., Penn C.W., Cole J.A., Overton T.W. A reassessment of the FNR regulon and transcriptomic analysis of the effects of nitrate, nitrite, NarXL, and NarQP as Escherichia coli K12 adapts from aerobic to anaerobic growth. J Biol Chem. 2006;281(8):4802-4815. doi 10.1074/jbc.M512312200

8. Diesh C., Stevens G.J., Xie P., De Jesus Martinez T., Hershberg E.A., Leung A., Guo E., … Haw R., Cain S., Buels R.M., Stein L.D., Holmes I.H. JBrowse 2: a modular genome browser with views of synteny and structural variation. Genome Biol. 2023;24(1):74. doi 10.1186/s13059-023-02914-z

9. Dudek C.-A., Jahn D. PRODORIC: state-of-the-art database of prokaryotic gene regulation. Nucleic Acids Res. 2022;50(D1):D295- D302. doi 10.1093/nar/gkab1110

10. Gaston K., Bell A., Kolb A., Buc H., Busby S. Stringent spacing requirements for transcription activation by CRP. Cell. 1990;62(4):733-743. doi 10.1016/0092-8674(90)90118-x

11. Graham A.I., Sanguinetti G., Bramall N., McLeod C.W., Poole R.K. Dynamics of a starvation-to-surfeit shift: a transcriptomic and modelling analysis of the bacterial response to zinc reveals transient behaviour of the Fur and SoxS regulators. Microbiology (Reading). 2012;158(Pt.1):284-292. doi 10.1099/mic.0.053843-0

12. Gupta S., Stamatoyannopoulos J.A., Bailey T.L., Noble W.S. Quantifying similarity between motifs. Genome Biol. 2007;8(2):R24. doi 10.1186/gb-2007-8-2-r24

13. Hertz G.Z., Stormo G.D. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics. 1999;15(7-8):563-577. doi 10.1093/bioinformatics/15.7.563

14. Horne C.R., Venugopal H., Panjikar S., Wood D.M., Henrickson A., Brookes E., North R.A., Murphy J.M., Friemann R., Griffin M.D.W., Ramm G., Demeler B., Dobson R.C.J. Mechanism of NanR gene repression and allosteric induction of bacterial sialic acid metabolism. Nat Commun. 2021;12(1):1988. doi 10.1038/s41467-021-22253-6

15. Huerta-Cepas J., Serra F., Bork P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol Biol Evol. 2016;33(6): 1635-1638. doi 10.1093/molbev/msw046

16. Hunter J.D. Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007;9(3):90-95. doi 10.1109/MCSE.2007.55

17. Kalivoda K.A., Steenbergen S.M., Vimr E.R. Control of the Escherichia coli sialoregulon by transcriptional repressor NanR. J Bacteriol. 2013;195(20):4689-4701. doi 10.1128/JB.00692-13

18. Katara P., Grover A., Sharma V. Phylogenetic footprinting: a boost for microbial regulatory genomics. Protoplasma. 2012;249(4):901-907. doi 10.1007/s00709-011-0351-9

19. Larkin M.A., Blackshields G., Brown N.P., Chenna R., McGettigan P.A., McWilliam H., Valentin F., Wallace I.M., Wilm A., Lopez R., Thompson J.D., Gibson T.J., Higgins D.G. Clustal W and Clustal X version 2.0. Bioinformatics. 2007;23(21):2947-2948. doi 10.1093/bioinformatics/btm404

20. Levy S., Hannenhalli S., Workman C. Enrichment of regulatory signals in conserved non-coding genomic sequence. Bioinformatics. 2001; 17(10):871-877. doi 10.1093/bioinformatics/17.10.871

21. Li G., Liu B., Ma Q., Xu Y. A new framework for identifying cis-regulatory motifs in prokaryotes. Nucleic Acids Res. 2011a;39(7):e42. doi 10.1093/nar/gkq948

22. Li G., Ma Q., Mao X., Yin Y., Zhu X., Xu Y. Integration of sequencesimilarity and functional association information can overcome intrinsic problems in orthology mapping across bacterial genomes. Nucleic Acids Res. 2011b;39(22):e150. doi 10.1093/nar/gkr766

23. Liu B., Zhang H., Zhou C., Li G., Fennell A., Wang G., Kang Y., Liu Q., Ma Q. An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes. BMC Genomics. 2016;17:578. doi 10.1186/s12864-016-2982-x

24. Liu X., Brutlag D.L., Liu J.S. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput. 2001;6:127-138

25. Liu X.S., Brutlag D.L., Liu J.S. An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nat Biotechnol. 2002;20(8):835-839. doi 10.1038/nbt717

26. Mao X., Ma Q., Zhou C., Chen X., Zhang H., Yang J., Mao F., Lai W., Xu Y. DOOR 2.0: presenting operons and their functions through dynamic and integrated views. Nucleic Acids Res. 2014;42(D1): D654-D659. doi 10.1093/nar/gkt1048

27. McCue L.A., Thompson W., Carmack C.S., Lawrence C.E. Factors influencing the identification of transcription factor binding sites by cross-species comparison. Genome Res. 2002;12(10):1523-1532. doi 10.1101/gr.323602

28. Mukhin A., Oschepkov D., Lashin S. A computational pipeline for de novo recognition of transcription factor binding sites in bacterial genomes. Problems of Informatics. 2024;4(65):69-83. doi 10.24412/ 2073-0667-2024-4-69-83 (in Russian)

29. Myers K.S., Yan H., Ong I.M., Chung D., Liang K., Tran F., Keleş S., Landick R., Kiley P.J. Genome-scale analysis of Escherichia coli FNR reveals complex features of transcription factor binding. PLoS Genet. 2013;9(6):e1003565. doi 10.1371/journal.pgen.1003565

30. Olman V., Xu D., Xu Y. CUBIC: identification of regulatory binding sites through data clustering. J Bioinform Comput Biol. 2003;1(1): 21-40. doi 10.1142/s0219720003000162

31. Pachkov M., Balwierz P.J., Arnold P., Ozonov E., van Nimwegen E. SwissRegulon, a database of genome-wide annotations of regulatory sites: recent updates. Nucleic Acids Res. 2013;41(D1):D214-D220. doi 10.1093/nar/gks1145

32. Park D.M., Akhtar M.S., Ansari A.Z., Landick R., Kiley P.J. The bacterial response regulator ArcA uses a diverse binding site architecture to regulate carbon oxidation globally. PLoS Genet. 2013;9(10): e1003839. doi 10.1371/journal.pgen.1003839

33. Peltek S., Bannikova S., Khlebodarova T.M., Uvarova Y., MukhinA.M., Vasiliev G., Scheglov M., Shipova A., Vasilieva A., Oshchepkov D., Bryanskaya A., Popik V. The transcriptomic response of cells of the thermophilic bacterium Geobacillus icigianus to terahertz irradiation. Int J Mol Sci. 2024;25(22):12059. doi 10.3390/ijms252212059

34. Salgado H., Gama-Castro S., Lara P., Mejia-Almonte C., Alarcón-Carranza G., López-Almazo A.G., Betancourt-Figueroa F., … Hernández-Alvarez A.J., Santos-Zavaleta A., Capella-Gutierrez S., Gelpi J.L., Collado-Vides J. RegulonDB v12.0: a comprehensive resource of transcriptional regulation in E. coli K-12. Nucleic Acids Res. 2024;52(D1):D255-D264. doi 10.1093/nar/gkad1072

35. Sayers E.W., Beck J., Bolton E.E., Bourexis D., Brister J.R., Canese K., Comeau D.C., … Wang J., Ye J., Trawick B.W., Pruitt K.D., Sherry S.T. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2021;49(D1):D10-D17. doi 10.1093/nar/gkaa892

36. Shimada T., Ogasawara H., Ishihama A. Single-target regulators form a minor group of transcription factors in Escherichia coli K-12. Nucleic Acids Res. 2018;46(8):3921-3936. doi 10.1093/nar/gky138

37. Tagle D.A., Koop B.F., Goodman M., Slightom J.L., Hess D.L., Jones R.T. Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J Mol Biol. 1988;203(2):439-455. doi 10.1016/0022-2836(88)90011-3

38. Taliaferro L.P., Keen E.F., Sanchez-Alberola N., Wolf R.E. Transcription activation by Escherichia coli Rob at class II promoters: proteinprotein interactions between Rob’s N-terminal domain and the σ70 subunit of RNA polymerase. J Mol Biol. 2012;419(3-4):139-157. doi 10.1016/j.jmb.2012.03.019

39. Tompa M., Li N., Bailey T.L., Church G.M., De Moor B., Eskin E., Favorov A.V., … Vandenbogaert M., Weng Z., Workman C., Ye C., Zhu Z. Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol. 2005;23(1):137-144. doi 10.1038/nbt1053

40. Tyson K.L., Bell A.I., Cole J.A., Busby S.J. Definition of nitrite and nitrate response elements at the anaerobically inducible Escherichia coli nirB promoter: interactions between FNR and NarL. Mol Microbiol. 1993;7(1):151-157. doi 10.1111/j.1365-2958.1993.tb01106.x

41. Tyson K.L., Cole J.A., Busby S.J. Nitrite and nitrate regulation at the promoters of two Escherichia coli operons encoding nitrite reductase: identification of common target heptamers for both NarP- and NarL-dependent regulation. Mol Microbiol. 1994;13(6):1045-1055. doi 10.1111/j.1365-2958.1994.tb00495.x

42. Ushida C., Aiba H. Helical phase dependent action of CRP: effect of the distance between the CRP site and the –35 region on promoter activity. Nucleic Acids Res. 1990;18(21):6325-6330. doi 10.1093/nar/18.21.6325

43. Xiao M., Lai Y., Sun J., Chen G., Yan A. Transcriptional regulation of the outer membrane porin gene ompW reveals its physiological role during the transition from the aerobic to the anaerobic lifestyle of Escherichia coli. Front Microbiol. 2016;7:799. doi 10.3389/fmicb.2016.00799

44. Zhang P., Ye Z., Ye C., Zou H., Gao Z., Pan J. OmpW is positively regulated by iron via Fur, and negatively regulated by SoxS contribution to oxidative stress resistance in Escherichia coli. Microb Pathog. 2020;138:103808. doi 10.1016/j.micpath.2019.103808


Review

Views: 100

JATS XML


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2500-3259 (Online)