Preview

Vavilov Journal of Genetics and Breeding

Advanced search

Flanking monomer repeats define lower context complexity of sites containing single nucleotide polymorphisms in the human genome

https://doi.org/10.18699/J15.092

Abstract

We have investigated a mutation frequency within the human genome for the set of known single nucleotide polymorphisms (SNPs) from the “1000 genomes” project. We have developed and applied novel statistical computational methods to analyze genetic text based on its complexity. A complexity profiling in a sliding window is applied to the sites containing single nucleotide polymorphisms within the human genome. A local decrease in text complexity level in SNP-containing sites has been shown. Analysis of the complexity profiles for SNPcontaining sites shows that flanking monomer repeats define a lower context complexity of sites containing SNPs within the human genome. An effect of local decrease in text complexity in SNP-containing sites is confirmed by analysis of polymorphisms in the rat and mouse genomes. We have found context differences between coding and regulatory sequences. These differences reflect a complexity of SNP-containing loci. The changes in point mutation frequency were shown previously for microsatellite containing sequences. Using enhanced mathematical tools and larger data sets this work shows enrichment of polytracks and simple sequence repeats in local genome surroundings of SNP containing sites. We have found high-frequency oligonucleotides within genomic regions containing SNPs. Such oligonucleotides are related to nucleotide polytracks. The presence of poly-A tracks might be associated with an increased probability of double helix DNA breaks around mutable loci and following fixation of nucleotide changes. The complexity estimates were computed using a previously developed program tool. This tool allows for both (i) complexity estimation of phased samples, and (ii) rapid and effective identification of the frequency spectrum of oligonucleotides with fixed lengths, and a comparison of oligonucleotide frequencies in different samples

About the Authors

N. S. Safronova
Institute of Cytology and Genetics SB RA S, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia
Russian Federation


M. P. Ponomarenko
Institute of Cytology and Genetics SB RA S, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia
Russian Federation


I. I. Abnizova
Sanger Center, Cambridge, UK
Russian Federation


G. V. Orlova
Institute of Cytology and Genetics SB RA S, Novosibirsk, Russia
Russian Federation


I. V. Chadaeva
Institute of Cytology and Genetics SB RA S, Novosibirsk, Russia
Russian Federation


Y. L. Orlov
Institute of Cytology and Genetics SB RA S, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia
Russian Federation


References

1. Babenko V.N., Kosarev P.S., Vishnevsky O.V., Levitsky V.G., Basin V. V., Frolov A.S. Investigating extended regulatory regions of genomic DNA sequences. Bioinformatics. 1999;15(7/8):644-653.DOI 10.1093/bioinformatics/15.7.644

2. Babenko V.N., Matvienko V.F., Safronova N.S. Implication of transposons distribution on chromatin state and genome architecture in human. J. Biomol. Struct. Dyn. 2015;33(1):10-11. DOI 10.1080/07391102.2015.1032559

3. Chuzhanova N.A., Krawczak M., Thomas N., Nemytikova L.A., Gusev V.D., Cooper D.N. The evolution of the vertebrate beta-globin gene promoter. Evolution. 2002;56(2):224-232.

4. Goh W.S., Orlov Y., Li J., Clarke N.D. Blurring of high-resolution data shows that the effect of intrinsic nucleosome occupancy on transcription factor binding is mostly regional, not local. PLoS Comput. Biol. 2010;6(1):e1000649. DOI 10.1371/journal.pcbi.1000649

5. Gusev V.D., Nemytikova L.A., Chuzhanova N.A. On the complexity measures of genetic sequences. Bioinformatics. 1999;15(12):994- 999. DOI 10.1093/bioinformatics/15.12.994

6. Ignatieva E.V., Podkolodnaya O.A., Orlov Y.L., Vasiliev G.V., Kolchanov N.A. Regulatory genomics: Combined experimental and computational approaches. Genetika = Genetics. 2015;51(4):409-429.

7. International HapMap 3 Consortium, Altshuler D.M., Gibbs R.A., Peltonen L., Dermitzakis E., Schaffner S.F., Yu F., Peltonen L., Dermitzakis E., Bonnen P.E., Altshuler D.M., Gibbs R.A., de Bakker P. I., Deloukas P., Gabriel S.B., Gwilliam R., Hunt S., Inouye M., Jia X., Palotie A., Parkin M., Whittaker P., Yu F., Chang K., Hawes A., Lewis L.R., Ren Y., Wheeler D., Gibbs R.A., Muzny D.M., Barnes C., Darvishi K., Hurles M., Korn J.M., Kristiansson K., Lee C., Mc Carrol S.A., Nemesh J., Dermitzakis E., Keinan A., Montgomery S. B., Pollack S., Price A.L., Soranzo N., Bonnen P.E., Gibbs R. A., Gonzaga-Jauregui C., Keinan A., Price A.L., Yu F., Anttila V., Brodeur W., Daly M.J., Leslie S., McVean G., Moutsianas L., Nguyen H., Schaffner S.F., Zhang Q., Ghori M.J., McGinnis R., McLaren W., Pollack S., Price A.L., Schaffner S.F., Takeuchi F., Grossman S. R., Shlyakhter I., Hostetter E.B., Sabeti P.C., Adebamowo C.A., Foster D.R., Licinio J., Manca M.C., Marshall P.A., Matsuda I., Ngare D., Wang V.O., Reddy D., Rotimi C.N., Royal C. D., Sharp R.R., Zeng C., Brooks L.D., McEwen J.E. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467(7311):52-58. DOI 10.1038/nature09298

8. Karlin S., Ost F., Blaisdell B.T. Patterns in DNA and amino-acid sequences and their statistical significance. Mathematical methods for DNA sequences. Ed. M.S. Waterman. Boca Raton: CRC Press, 1989.

9. Kulakova E.V., Spitsina A.M., Orlova N.G., Dergilev A.I., Svichkarev A.V., Safronova N.S., Chernykh I.G., Orlov Y.L. Program analysis of genomic sequence data, obtained through technologies ChIP-seq, ChIA-PET and Hi-C. Programmnye sistemy: teoriya i prilozheniya = Program Systems: Theory and Applications. 2015;6(2): 129-148.

10. Lenz C., Haerty W., Golding G.B. Increased substitution rates surrounding low- complexity regions within primate proteins. Genome Biol. Evol. 2014;6(3):655-665. DOI 10.1093/gbe/evu042

11. Medvedeva S.A., Panchin A.Y., Alexeevski A.V., Spirin S.A., Panchin Y.V. Comparative Analysis of Context-Dependent Mutagenesis Using Human and Mouse Models. BioMed Res. Intern. 2013;2013. Article ID 989410

12. Orlov Y.L. Analiz regulyatornykh genomnykh posledovatelnostey s pomoshchyu kompyuternykh metodov otsenok slozhnosti geneticheskikh tekstov. Diss. kand. biol. nauk. [Analysis of regulatory genome sequences using computer methods of genetic text complexity. Cand. biol. sci. diss.]. Novosibirsk, 2004.

13. Orlov Y.L., Bragin A.O., Medvedeva I.V., Podkolodnaia O.A., Khlebodarova T.M., Kolchanov N.A. ICGenomics: Software for analysis of symbol genomics sequences. Vavilovskii Zhurnal Genetiki i Selektsii = Vavilov Journal of Genetics and Breeding. 2012;16(4/1):732-741.

14. Orlov Y.L., Filippov V.P., Potapov V.N., Kolchanov N.A. Construction of stochastic context trees for genetic texts. In Silico Biology. 2002;2(3):257-262.

15. Orlov Y.L., Levitskii V.G., Smirnova O.G., Gunbin K.V., Demenkov P.S., Vishnevsky O.V., Levitsky V.G., Oshchepkov D.Y., Podkolodnyi N.L., Afonnikov D.A., Grosse I., Kolchanov N.A. Statistical analysis of nucleosome formation sites. Biofizika = Biophisics (Moscow). 2006;51(4):608-614.

16. Orlov Y.L., Potapov V.N. Complexity: an internet resource for analysis of DNA sequence complexity. Nucl. Acids. Res. 2004;32(Web Server issue):W628-633. DOI 10.1093/nar/gkh466

17. Orlov Y.L., Te Boekhorst R., Abnizova I.I. Statistical measures of the structure of genomic sequences: entropy, complexity, and position information. J. Bioinform. Comput. Biol. 2006;4:523-536. DOI 10.1142/S0219720006001801

18. Polanovski O.L., Lebedenko E.N., Deyev S.M. ERBB oncogenes as targets for monoclonal antibodies. Biokhimia = Biochemistry (Moscow). 2012;77(3):289-311.

19. Ponomarenko J.V., Orlova G.V., Merkulova T.I., Gorshkova E.V., Fokin O.N., Vasiliev G.V., Frolov A.S., Ponomarenko M.P. rSNP_ Guide: an integrated database-tools system for studying SNPs and site-directed mutations in transcription factor binding sites. Hum. Mutat. 2002;20(4):239-248. DOI 10.1002/humu.10116

20. Ponomarenko M., Mironova V., Gunbin K., Savinkova L. Hogness Box. Brenner’s Encyclopedia of Genetics. 2nd edn. Eds S. Maloy, K. Hughe. San Diego: Acad. Press, Elsevier Inc. 2013а;3:491-494. DOI 10.1016/B978-0-12-374984-0.00720-8

21. Ponomarenko M., Savinkova L., Kolchanov N. Initiation Factors. Brenner’s Encyclopedia of Genetics, 2nd ed. Eds S. Maloy, K.Hughes. San Diego: Acad. Press, Elsevier Inc. 2013b;4:83-85. DOI 10.1016/B978-0-12-374984-0.00798-1

22. Ponomarenko P.M., Savinkova L.K., Drachkova I.A., Lysova M.V., Arshinova T.V., Ponomarenko M.P., Kolchanov N.A. A step-by-step model of TBP/TATA box binding allows predicting human hereditary diseases by single nucleotide polymorphism. Doklady RAN = Proceedings of the Russian Academy of Sciences. 2008;419(6):828-832.

23. Putta P., Orlov Y.L., Podkolodnyy N.L., Mitra C.K. Relatively conserved common short sequences in transcription factor binding sites and miRNA. Vavilov Journal of Genetics and Breeding. 2011;15(4): 750-756.

24. Rogozin I.B., Kolchanov N.A. Somatic hypermutagenesis in immunoglobulin genes. II. Influence of neighbouring base sequences on mutagenesis. Biochim. Biophys. Acta. 1992;1171(1):11-18. DOI 10.1016/0167-4781(92)90134-L

25. Rogozin I.B., Pavlov Y.I., Bebenek K., Matsuda T., Kunkel T.A. Somatic mutation hotspots correlate with DNA polymerase eta error spectrum. Nat. Immunol. 2001;2(6):530-536. DOI 10.1038/88732

26. Rogozin I.B., Solovyov V.V., Kolchanov N.A. Somatic hypermutagenesis in immunoglobulin genes. I. Correlation between somatic mutations and repeats. Somatic mutation properties and clonal selection. Biochim. Biophys. Acta. 1991;1089(2):175- 182. DOI10.1016/0167-4781(91)90005-7

27. Safronova N.S., Babenko V.N., Orlov Y.L. 117 Analysis of SNP containing sites in human genome using text complexity estimates. J. Biomol. Struct. Dyn. 2015;33(1):73- 74. DOI 10.1080/07391102.2015.1032750

28. Savinkova L.K., Ponomarenko M.P., Ponomarenko P.M., Drachkova I. A., Lysova M.V., Arshinova T.V., Kolchanov N.A. TATA box polymorphisms in human gene promoters and associated hereditary pathologies. Biokhimiya = Biochemistry (Moscow). 2009;74(2): 149-163.

29. Siddle K.J., Goodship J.A., Keavney B., Santibanez-Koref M.F. Bases adjacent to mononucleotide repeats show an increased single nucleotide polymorphism frequency in the human genome. Bioinformatics. 2011;27(7):895-898. DOI 10.1093/bioinformatics/btr067

30. Sidore C., Busonero F., Maschio A., Porcu E., Naitza S., Zoledziewska M., Mulas A., Pistis G., Steri M., Danjou F., Kwong A., Ortega Del Vecchyo V.D., Chiang C.W., Bragg-Gresham J., Pitzalis M., NagarajaR., Tarrier B., Brennan C., Uzzau S., Fuchsberger C., Atzeni R., Reinier F., Berutti R., Huang J., Timpson N.J., Toniolo D., Gasparini P., Malerba G., Dedoussis G., Zeggini E., Soranzo N., Jones C., Lyons R., Angius A., Kang H.M., Novembre J., Sanna S., Schlessinger D., Cucca F., Abecasis G.R. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat. Genet. 2015; 47(11):1272-1281. DOI 10.1038/ng.3368

31. Spitsina A.M., Orlov Y.L., Podkolodnaya N.N., Svichkarev A.V., Dergilev A.I., Chen M., Kuchin N.V., Chernykh I.G., Glinskij B.M. Supercomputer analysis of genomics and transcriptomics data revealed by high-throughput DNA sequencing. Programmnye sistemy: teoriya i prilozheniya = Program Systems: Theory and Applications. 2015;6:1(23):157-174.

32. Trifonov E.N., Volkovich Z., Frenkel Z.M. Multiple levels of meaning in DNA sequences, and one more. Ann. N.Y. Acad Sci. 2012;1267: 35-38. DOI 10.1111/j.1749- 6632.2012.06589.x

33. Troyanskaya O.G., Arbell O., Koren Y. Landau G.M., Bolshoy A. Sequence complexity profiles of prokaryotic genomic sequences: a fast algorithm for calculating linguistic complexity. Bioinformatics. 2002;18(5):679-688. DOI 10.1093/bioinformatics/18.5.679

34. UK10K Consortium; Walter K., Min J.L., Huang J. Crooks L., Memari Y., McCarthy S., Perry J.R., Xu C., Futema M., Lawson D., Iotchkova V., Schiffels S., Hendricks A.E., Danecek P., Li R., FloydJ., Wain L.V., Barroso I., Humphries S.E., Hurles M.E., Zeggini E., Barrett J.C., Plagnol V., Richards J.B., Greenwood C.M., TimpsonN.J., Durbin R., Soranzo N. The UK10K project identifies rare variants in health and disease. Nature. 2015;526:82-90. DOI 10.1038/nature14962

35. Vowles E.J., Amos W. Evidence for widespread convergent evolution around human microsatellites. PLoS Biol. 2004;2:E199. DOI 10.1371/journal.pbio.0020199

36. Wootton J.C., Federhen S. Analysis of compositionally biased regions in sequence databases. Methods Enzymol. 1996;266:554-571. DOI 10.1016/S0076-6879(96)66035-2


Review

Views: 751


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2500-3259 (Online)