Preview

Vavilov Journal of Genetics and Breeding

Advanced search

EFFECT OF FLANKING SEQUENCES ON THE ACCURACY OF THE RECOGNITION OF TRANSCRIPTION FACTOR BINDING SITES

Abstract

The development of in vitro methods produced new experimental information on protein binding to DNA, which is accumulated in databases and used in studies of mechanisms regulating gene expression and in the development of computer-assisted methods of binding site recognition in pro- and eukaryotic genomes. However, it is still questionable to what extent sequences selected in vitro reflect the actual structures of natural transcription factor (TF) binding sites. The Kullback – Leibler divergence was applied to the comparison of frequency matrices of TF binding sites constructed on samples of artificially selected sequences and natural sites. Core sequences of natural and artificial sites showed high similarity for 80 % of all TFs studied. For 20 % of TFs, binding site sequences selected in vitro had a broader range of permissible significant nucleotides not found in natural sites. The optimum lengths of DNA sequences including natural binding sites, at which they are recognized most accurately, were estimated by the weight matrix method. For approximately 80 % of the TFs studied, the optimum binding site length notably exceeded the lengths of the core sequences, as well as the lengths of in vitro selected sites. The detected features of in vitro selected TF binding sites impose constraints on their use in the development of computer-assisted methods of the recognition of candidate sites in genomic sequences.

About the Authors

T. M. Khlebodarova
Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
Russian Federation


D. Yu. Oshchepkov
Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
Russian Federation


V. G. Levitsky
Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia Novosibirsk National Research State University, Novosibirsk, Russia
Russian Federation


O. A. Podkolodnaya
Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
Russian Federation


E. V. Ignatieva
Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
Russian Federation


E. A. Ananko
Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
Russian Federation


I. L. Stepanenko
Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
Russian Federation


N. A. Kolchanov
Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia Novosibirsk National Research State University, Novosibirsk, Russia
Russian Federation


References

1. Aerts S., Van Loo P., Thijs G. et al. Computational detection of cis -regulatory modules // Bioinformatics. 2003. V. 19. Suppl 2. P. ii5–14.

2. Blackwell T.K., Weintraub H. Differences and similarities in DNA-binding preferences of MyoD and E2A protein complexes revealed by binding site selection // Science. 1990. V. 250. P. 1104–1110.

3. Bryne J.C., Valen E., Tang M.H. et al. JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update // Nucl. Acids Res. 2008. V. 36. P. D102–D106.

4. Chen L., Wu G., Ji H. hmChIP: a database and web server for exploring publicly available human and mouse ChIPseq and ChIP-chip data // Bioinformatics. 2011. V. 27. P. 1447–1448.

5. Djordjevic M. SELEX experiments: new prospects, applications and data analysis in inferring regulatory pathways // Biomol. Eng. 2007. V. 24. P. 179–189.

6. Ehret G.B., Reichenbach P., Schindler U. et al. DNA binding specifi city of different STAT proteins. Comparison of in vitro specificity with natural target sites // J. Biol. Chem. 2001. V. 276. P. 6675–6688.

7. Gershenzon N.I., Stormo G.D., Ioshikhes I.P. Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites // Nucleic Acids Res. 2005. V. 33. P. 2290–2301.

8. Grote A., Klein J., Retter I. et al. PRODORIC (release 2009): a database and tool platform for the analysis of gene regulation in prokaryotes // Nucl. Acids Res. 2009. V. 37. P. D61–D65.

9. Hardenbol P., Wang J., Van Dyke M. Identification of preferred hTBP DNA binding sites by the combinatorial method REPSA // Nucl. Acids Res. 1997. V. 25. P. 3339–3344.

10. Khlebodarova T.M., Podkolodnaya O.A., Oshchepkov D.Y. et al. ARTSITE DATABASE: Structures of natural and in vitro selected transcription factor binding sites // Bioinformatics of Genome Regulation and Structure II. Ed. By N. Kolchanov and R. Hofestaedt, Springer Science+Business Media, Inc., 2006. P. 55–65.

11. Kolchanov N.A., Ignatieva E.V., Ananko E.A. et al. Transcription Regulatory Regions Database (TRRD): its status in 2002 // Nucl. Acids Res. 2002. V. 30. P. 312–317.

12. Kolchanov N.A., Ignatieva E.V., Podkolodnaya O.A. et al. TRRD: Technology for extraction, storage, and use of knowledge about the structural-functional organization of the transcriptional regulatory regions in the eukaryotic genes // Intelligent Data Analysis, 2008. V. 12. No. 5. P. 443–461.

13. Kulakovskiy I., Levitsky V., Oshchepkov D. et al. From binding motifs in ChIP-Seq data to improved models of transcription factor binding sites // J. Bioinform. Comput. Biol. 2013. V. 11. P. 1340004.

14. Lescot M., Dehais P., Thijs G. et al. PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences // Nucl. Acids Res. 2002. V. 30. P. 325–327.

15. Levitsky V.G., Ignatieva E.V., Ananko E.A. et al. Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions // BMC Bioinformatics. 2007. V. 8. P. 481.

16. Liu X., Yu X., Zack D.J. et al. TiGER: a database for tissuespecific gene expression and regulation // BMC Bioinformatics. 2008. V. 9. P. 271. doi: 10.1186/1471-2105-9-271.

17. Matys V., Fricke E., Geffers R. et al. TRANSFAC: transcriptional regulation, from patterns to profi les // Nucl. Acids Res. 2003. V. 31. P. 374–378.

18. Matys V., Kel-Margoulis O.V., Fricke E. et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes // Nucl. Acids Res. 2006. V. 34. P. D108–D110.

19. Munch R., Hiller K., Barg H. et al. PRODORIC: prokaryotic database of gene regulation. Nuc. Acids Res. 2003. V. 31. P. 266–269.

20. Nandi S., Ioshikhes I. Optimizing the GATA-3 position weight matrix to improve the identifi cation of novel binding sites // BMC Genomics. 2012. V. 13. P. 416.

21. Newburger D.E., Bulyk M.L. UniPROBE: an online database of protein binding microarray data on protein-DNA interactions // Nucl. Acids Res. 2009. V. 37. P. D77–D82.

22. Pollock R., Treisman R. A sensitive method for the determination of protein-DNA binding specifi cities // Nucl. Acids Res. 1990. V. 18. P. 6197–6204.

23. Ponomarenko J.V., Orlova G.V., Ponomarenko M.P. et al. SELEX_DB: a database on in vitro selected oligomers adapted for recognizing natural sites and for analyzing both SNPs and site-directed mutagenesis data // Nucl. Acids Res. 2000. V. 28. P. 205–208.

24. Portales-Casamar E., Thongjuea S., Kwon AT. et al. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profi les // Nucl. Acids Res. 2010. V. 38. P. D105–D110.

25. Praz V., Perier R., Bonnard C., Bucher P. The Eukaryotic Promoter Database, EPD: new entry types and links to gene expression data // Nucl. Acids Res. 2002. V. 30. P. 322–324.

26. Robison K., McGuire A.M., Church G.M. A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome // J. Mol. Biol. 1998. V. 284. P. 241–254.

27. Roulet E., Bucher P., Schneider R. et al. Experimental analysis and computer prediction of CTF/NFI transcription factor DNA binding sites // J. Mol. Biol. 2000. V. 297. P. 833–848.

28. Roulet E., Busso S., Camargo A.A. et al. High-throughput SELEX SAGE method for quantitative modeling of transcription-factor binding sites // Nat. Biotechnol. 2002. V. 20. P. 831–835.

29. Sandelin A., Alkema W. Engstrom P., Wasserman W.W., Lenhard B. JASPAR: an open-access database for eukaryotic transcription factor binding profi les // Nucl. Acids Res. 2004. V. 32. P. D91–94.

30. Shultzaberger R.K., Schneider T.D. Using sequence logos and information analysis of Lrp DNA binding sites to investigate discrepancies between natural selection and SELEX // Nucl. Acids Res. 1999. V. 27. P. 882–887.

31. Siddharthan R. Dinucleotide weight matrices for predicting transcription factor binding sites: generalizing the position weight matrix // PLoS One. 2010. V. 5. Р. e9722.

32. Wang J., Lu J., Gu G., Liu Y. In vitro DNA-binding profile of transcription factors: methods and new insights // J. Endocrinol. 2011. V. 210. P. 15–27.

33. Wingender E., Chen X., Fricke E. et al. The TRANSFAC system on gene expression regulation // Nucl. Acids Res. 2001. V. 29. P. 281–283.

34. Wright W.E., Binder M., Funk W. Cyclic amplification and selection of targets (CASTing) for the myogenin consensus binding site // Mol. Cell. Biol. 1991. V. 11. P. 4104–4110.

35. Zhang M.Q., Marr T.G. A weight array method for splicing signal analysis // Comput. Appl. Biosci. 1993. V. 9. P. 499–509.

36. Zhao F., Xuan Z., Liu L., Zhang M.Q. TRED: a Transcriptional Regulatory Element Database and a platform for in silico gene regulation studies // Nucl. Acids Res. 2005. V. 33. P. D103–D107.


Review

Views: 1304


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2500-3259 (Online)