Вычислительные проблемы анализа ошибок коротких прочтений ДНК при секвенировании следующего поколения

Р. те Боекхорст; Ф. М. Науменко; Н. Г. Орлова; Э. Р. Галиева; А. М. Спицина; И. В. Чадаева; Ю. Л. Орлов; И. И. Абнизова

doi:10.18699/VJ16.191

Вычислительные проблемы анализа ошибок коротких прочтений ДНК при секвенировании следующего поколения

Р. те Боекхорст, Ф. М. Науменко, Н. Г. Орлова, Э. Р. Галиева, А. М. Спицина, И. В. Чадаева, Ю. Л. Орлов, И. И. Абнизова

https://doi.org/10.18699/VJ16.191

Полный текст:

PDF (Eng)

сгенерировать QR код

Аннотация

Секвенирование следующего поколения (NGS) с помощью коротких прочтений ДНК вносит большой вклад в решение задач современной геномики, генетики, клеточной биологии и медицины, особенно в исследования метагеномики, сравнительной геномики, определение полиморфизмов, скрининг мутаций, транскриптомное профилирование, изучение ремоделирования хроматина и многие другие приложения. Секвенирование неустойчиво к техническим ошибкам, которые могут влиять на научные выводы. NGS технологии состоят из создания коллекции многочисленных коротких фрагментов ДНК, именуемой «библиотекой», получения молекулярных колоний и их дальнейшего массового параллельного секвенирования. Такие секвенированные фрагменты называются «прочтениями», они собираются (ассемблируются) в протяженные строки. Протяженные последовательности, в свою очередь, собираются в геномы для дальнейшего анализа. Вычислительные/процессинговые ошибки и сбои секвенирования – это ошибки, возникающие при последующей цифровой обработке секвенированных образцов. Последующая обработка (процессирование) включает процедуры оценки качества, картирования, ассемблирования и даже корректировки ошибочных данных. Данная статья рассматривает вычислительные ошибки процессирования, компьютерные и статистические подходы для их определения, а также представляет словарь терминологии секвенирования. Рассмотрены задачи идентификации мутаций («Определение вариаций») в данных секвенирования и контроль качества их определения. Определение вариаций включает локальные вариации, такие как одиночные нуклеотидные полиморфизмы, короткие вставки и делеции (инделы), и масштабные вариации (инверсии, транслокации или большие инделы). Обсуждены проблемы контроля качества исходных (сырых) данных, ошибки, возникающие на этапах выравнивания прочтений последовательностей ДНК на референсный геном и последующего выравнивания/ассемблирования.

Ключевые слова

секвенирование следующего поколения, ДНК, технологии секвенирования, статистические неоднородности, геномные полиморфизмы, ошибки секвенирования, обзор

Об авторах

Р. те Боекхорст

Университет Хартфордшира
Россия
Хатфилд, Великобритания

Ф. М. Науменко

Федеральное государственное автономное образовательное учреждение высшего образования «Новосибирский национальный исследовательский государственный университет»
Россия
Новосибирск, Россия

Н. Г. Орлова

Федеральное государственное автономное образовательное учреждение высшего образования «Новосибирский национальный исследовательский государственный университет» Федеральное государственное бюджетное образовательное учреждение высшего профессионального образования «Новосибирский государственный архитектурно-строительный университет (Сибстрин)»
Россия
Новосибирск, Россия

Э. Р. Галиева

Федеральное государственное автономное образовательное учреждение высшего образования «Новосибирский национальный исследовательский государственный университет» Федеральное государственное бюджетное научное учреждение «Федеральный исследовательский центр Институт цитологии и генетики Сибирского отделения Российской академии наук»
Россия
Новосибирск, Россия

А. М. Спицина

И. В. Чадаева

Ю. Л. Орлов

Федеральное государственное автономное образовательное учреждение высшего образования «Новосибирский национальный исследовательский государственный университет» Федеральное государственное бюджетное научное учреждение «Федеральный исследовательский центр Институт цитологии и генетики Сибирского отделения Российской академии наук»
Россия
Новосибирск, Россия

И. И. Абнизова

Институт Сэнгера, Велком Траст
Россия
Кембридж, Великобритания

Список литературы

1. Abnizova I., Leonard S., Skelly T., Brown A., Jackson D., Gourtovaia M., Qi G., Te Boekhorst R., Faruque N., Lewis K., Cox T. Analysis of context-dependent errors for illumina sequencing. J. Bioinform. Comput. Biol. 2012;10(2):1241005.

2. Abnizova I., Skelly T., Naumenko F., Whiteford N., Brown C., Cox T. Statistical comparison of methods to estimate the error probability in short-read Illumina sequencing. J. Bioinform. Comput. Biol. 2010;8(3):579-591.

3. Albers C.A., Lunter G., MacArthur D.G., McVean G., Ouwehand W.H., Durbin R. Dindel: accurate indel calls from short-read data. Genome Res. 2011;21(6):961-973.

4. Alkan C., Cardone M.F., Catacchio C.R., Antonacci F., O’Brien S.J., Ryder O.A., Purgato S., Zoli M., Della Valle G., Eichler E.E., Ventura M. Genome-wide characterization of centromeric satellites from multiple mammalian genomes. Genome Res. 2011a;21(1):137-145.

5. Alkan C., Sajjadian S., Eichler E.E. Limitations of next-generation genome sequence assembly. Nat. Methods. 2011b;8(1):61-65.

6. Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215(3):403-410.

7. Anders S., Pyl P.T., Huber W. HTSeq – a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31(2): 166-169.

8. Auerbach R.K., Euskirchen G., Rozowsky J., Lamarre-Vincent N., Moqtaderi Z., Lefrancois P., Struhl K., Gerstein M., Snyder M. Mapping accessible chromatin regions using Sono-Seq. Proc. Natl. Acad. Sci. USA. 2009;106(35):14926-14931.

9. Baker M. De novo genome assembly: what every biologist should know. Nature Methods. 2012;9.

10. Balint B. Decreased sequencing accuracy at the 3′ end of SBS Illumina Reads. 2016.

11. Benjamini Y., Speed T.P. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 2012; 40(10):e72.

12. Bonfield J.K., Staden R. The application of numerical estimates of base calling accuracy to DNA sequencing projects. Nucleic Acids Res. 1995;23(8):1406-1410.

13. Bragg L.M., Stone G., Butler M.K., Hugenholtz P., Tyson G.W. Shining a light on dark sequencing: characterising errors in Ion Torrent PGM data. PLoS Comput. Biol. 2013;9(4):e1003031.

14. Brockman W., Alvarez P., Young S., Garber M., Giannoukos G., Lee W.L., Russ C., Lander E.S., Nusbaum C., Jaffe D.B. Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Res. 2008;18(5):763-770.

15. Chen Y.C., Liu T., Yu C.H., Chiang T.Y., Hwang C.C. Effects of GC bias in next-generation- sequencing data on de novo genome assembly. PLoS One. 2013; 8(4):e62856.

16. Chin E.L.H., da Silva C., Hegde M. Assessment of clinical analytical sensitivity and specificity of next-generation sequencing for detection of simple and complex mutations. BMC Genet. 2013;14:6.

17. Chin F.Y.L., Leung H.C.M., Yiu S.M. Sequence assembly using next generation sequencing data – challenges and solutions. Sci. China Life Sciences. 2014;57(11):1140-1148.

18. Cox M.P., Peterson D.A., Biggs P.J. SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics. 2010;11:485.

19. Davis M.P., van Dongen S., Abreu-Goodger C., Bartonicek N., Enright A.J. Kraken: a set of tools for quality control and analysis of high-throughput sequence data. Methods. 2013;63(1):41-49.

20. Day-Williams A.G., Zeggini E. The effect of next-generation sequencing technology on complex trait research. Eur. J. Clin. Invest. 2011; 41(5):561-567.

21. de Koning A.P., Gu W., Castoe T.A., Batzer M.A., Pollock D.D. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 2011;7(12):e1002384.

22. Del Fabbro C., Scalabrin S., Morgante M., Giorgi F.M. An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS One. 2013;8(12):e85024.

23. DePristo M.A., Banks E., Poplin R., Garimella K.V., Maguire J.R., Hartl C., Philippakis A.A., del Angel G., Rivas M.A., Hanna M., McKenna A., Fennel T.J., Kernytsky A.M., Sivachenko A.Y., Cibulskis K., Gabriel S.B., Altshuler D., Daly M.J. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 2011;43(5):491-498.

24. Dolled-Filhart M.P., Lee M., Jr., Ou-Yang C.W., Haraksingh R.R., Lin J.C. Computational and bioinformatics frameworks for nextgeneration whole exome and genome sequencing. Sci. World J. 2013:730210.

25. Edgar R.C., Flyvbjerg H. Error filtering, pair assembly and error correction for next- generation sequencing reads. Bioinformatics. 2015; 31(21):3476-3482.

26. Ewing B., Hillier L., Wendl M.C., Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998;8(3):175-185.

27. Feschotte C., Jiang N., Wessler S.R. Plant transposable elements: where genetics meets genomics. Nat. Rev. Genet. 2002;3(5):329-341.

28. Fonseca N.A., Rung J., Brazma A., Marioni J.C. Tools for mapping highthroughput sequencing data. Bioinformatics. 2012;28(24):3169-3177.

29. Fujimoto M., Bodily P.M., Okuda N., Clement M.J., Snell Q. Effects of error-correction of heterozygous next-generation sequencing data. BMC Bioinformatics. 2014;15(Suppl.7):S3.

30. Garcia-Garcia G., Baux D., Faugere V., Moclyn M., Koenig M., Claustres M., Roux A.F. Assessment of the latest NGS enrichment capture methods in clinical context. Sci. Rep. 2016;6:20948.

31. Guo Y., Ye F., Sheng Q., Clark T., Samuels D.C. Three-stage quality control strategies for DNA re-sequencing data. Brief Bioinform. 2014a;15(6):879-889.

32. Guo Y., Zhao S., Sheng Q., Ye F., Li J., Lehmann B., Pietenpol J., Samuels D.C., Shyr Y. Multi- perspective quality control of Illumina exome sequencing data using QC3. Genomics. 2014b;103(5-6):323-328.

33. Hadfield J. Quality control for your NGS data. 2013.

34. Harismendy O., Ng P.C., Strausberg R.L., Wang X., Stockwell T.B., Beeson K.Y., Schork N.J., Murray S.S., Topol E.J., Levy S., Frazer K.A. Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol. 2009;10(3):R32.

35. Huang Y.F., Chen S.C., Chiang Y.S., Chen T.H., Chiu K.P. Palindromic sequence impedes sequencing-by-ligation mechanism. BMC Syst. Biol. 2012;6(Suppl.2):S10.

36. Huddleston J., Ranade S., Malig M., Antonacci F., Chaisson M., Hon L., Sudmant P.H., Graves T.A., Alkan C., Dennis M.Y., Wilson R.K., Turner S.W., Korlach J., Eichler E.E. Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res. 2014;24(4):688-696.

37. IDT. 2011. G-quenching.

38. Ignatieva E.V., Podkolodnaya O.A., Orlov Y.L., Vasiliev G.V., Kolchanov N.A. Regulatory genomics: Integrated experimental and computer approaches. Genetika. 2015;51(4):409- 429.

39. Illumina. 2014. Sequencing Library QC on the MiSeq® System. Janin L., Schulz-Trieglaff O., Cox A.J. BEETL-fastq: a searchable compressed archive for DNA reads. Bioinformatics. 2014;30(19):2796-2801.

40. Jiang H., Lei R., Ding S.W., Zhu S. Skewer: a fast and accurate adapter trimmer for next- generation sequencing paired-end reads. BMC Bioinformatics. 2014;15:182.

41. Jun G., Wing M.K., Abecasis G.R., Kang H.M. An efficient and scalable analysis framework for variant extraction and refinement from population- scale DNA sequence data. Genome Res. 2015;25(6):918-925.

42. Kelley D.R., Schatz M.C., Salzberg S.L. Quake: quality-aware detection and correction of sequencing errors. Genome Biol. 2010;11(11):R116.

43. Kojima K., Nariai N., Mimori T., Takahashi M., Yamaguchi-Kabata Y., Sato Y., Nagasaki M. A statistical variant calling approach from pedigree information and local haplotyping with phase informative reads. Bioinformatics. 2013;29(22):2835-2843.

44. Lander E.S. Initial impact of the sequencing of the human genome. Nature. 2011;470(7333):187-197.

45. Lawrence M. Introduction to variant calling. Bioconductor. 2014.

46. Ledergerber C., Dessimoz C. Base-calling for next-generation sequencing platforms. Brief Bioinform. 2011;12(5):489-497.

47. Lettice L.A., Hill A.E., Devenney P.S., Hill R.E. Point mutations in a distant sonic hedgehog cis-regulator generate a variable regulatory output responsible for preaxial polydactyly. Hum. Mol. Genet. 2008;17(7):978-985.

48. Li H., Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26(5):589-595.

49. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. Genome Project Data Processing S. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078-2079.

50. Li H., Homer N. A survey of sequence alignment algorithms for nextgeneration sequencing. Brief Bioinform. 2010;11(5):473-483.

51. Li J.W., Robison K., Martin M., Sjodin A., Usadel B., Young M., Olivares E.C., Bolser D.M. The SEQanswers wiki: a wiki database of tools for high-throughput sequencing analysis. Nucleic Acids Res. 2012a;40(Database iss.):D1313-1317.

52. Li J.W., Schmieder R., Ward R.M., Delenick J., Olivares E.C., Mittelman D. SEQanswers: an open access community for collaboratively decoding genomes. Bioinformatics. 2012b;28(9):1272-1273.

53. Li M.K., Stoneking M. A new approach for detecting low-level mutations in next-generation sequence data. Genome Biology. 2012;13(5).

54. Li R., Yu C., Li Y., Lam T.W., Yiu S.M., Kristiansen K., Wang J. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009;25(15):1966-1967.

55. Li S., Li R., Li H., Lu J., Li Y., Bolund L., Schierup M.H., Wang J. SOAPindel: efficient identification of indels from short paired reads. Genome Res. 2013;23(1):195-200.

56. Li Z., Chen Y., Mu D., Yuan J., Shi Y., Zhang H., Gan J., Li N., Hu X., Liu B., Yang B., Fan W. Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph. Brief Funct. Genomics. 2012;11(1):25-37.

57. Lin K., Smit S., Bonnema G., Sanchez-Perez G., de Ridder D. Making the difference: integrating structural variation detection tools. Brief Bioinform. 2015;16(5):852-864.

58. Liu L., Li Y., Li S., Hu N., He Y., Pong R., Lin D., Lu L., Law M. Comparison of next-generation sequencing systems. J. Biomed. Biotechnol. 2012;251364.

59. Lo C.C., Chain P.S. Rapid evaluation and quality control of next generation sequencing data with FaQCs. BMC Bioinformatics. 2014;15:366.

60. Magoc T., Pabinger S., Canzar S., Liu X., Su Q., Puiu D., Tallon L.J., Salzberg S.L. GAGE-B: an evaluation of genome assemblers for bacterial organisms. Bioinformatics. 2013;29(14):1718-1725.

61. Makinen V., Salmela L., Ylinen J. Normalized N50 assembly metric using gap-restricted co- linear chaining. BMC Bioinformatics. 2012; 13:255.

62. Marian A.J. Molecular genetic studies of complex phenotypes. Transl. Res. 2012;159(2):64- 79.

63. Marroni F., Pinosio S., Morgante M. The quest for rare variants: pooled multiplexed next generation sequencing in plants. Front Plant. Sci. 2012;3:33.

64. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.Journal. 2011;17(1).

65. Massingham T., Goldman N. All your Base: a fast and accurate probabilistic approach to base calling. Genome Biology. 2012;13(2).

66. McCoy R.C., Taylor R.W., Blauwkamp T.A., Kelley J.L., Kertesz M., Pushkarev D., Petrov D.A., Fiston-Lavier A.S. Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly- repetitive transposable elements. PLoS One. 2014;9(9):e106689.

67. Medvedev P., Stanciu M., Brudno M. Computational methods for discovering structural variation with next-generation sequencing. Nat. Methods. 2009;6(Suppl.11):S13-20.

68. Miller J.R., Koren S., Sutton G. Assembly algorithms for next-generation sequencing data. Genomics. 2010;95(6):315-327.

69. Minoche A.E., Dohm J.C., Himmelbauer H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 2011;12(11):R112.

70. Mir K., Neuhaus K., Bossert M., Schober S. Short barcodes for next generation sequencing. PLoS One. 2013;8(12):e82933.

71. Mutarelli M., Marwah V., Rispoli R., Carrella D., Dharmalingam G., Oliva G., di Bernardo D. A community-based resource for automatic exome variant-calling and annotation in Mendelian disorders. BMC Genomics. 2014;15(Suppl.3):S5.

72. Nagarajan N., Pop M. Sequence assembly demystified. Nat. Rev. Genet. 2013;14(3):157-167.

73. Nakamura K., Oshima T., Morimoto T., Ikeda S., Yoshikawa H., Shiwa Y., Ishikawa S., Linak M.C., Hirai A., Takahashi H., Altaf-Ul-Amin Md., Ogasawara N., Kanaya S. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 2011;39(13):e90.

74. Newell F. NGS mapping, errors and quality control. Australia: Univ. of Queensland, 2014.

75. Nielsen C.B., Cantor M., Dubchak I., Gordon D., Wang T. Visualizing genomes: techniques and challenges. Nat. Methods. 2010; 7(Suppl.3):S5-S15.

76. Niu B., Fu L., Sun S., Li W. Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinformatics. 2010; 11:187.

77. Olson N.D., Lund S.P., Colman R.E., Foster J.T., Sahl J.W., Schupp J.M., Keim P., Morrow J.B., Salit M.L., Zook J.M. Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Front. Genet. 2015;6:235.

78. Orlov Y.L., Te Boekhorst R., Abnizova I.I. Statistical measures of the structure of genomic sequences: entropy, complexity, and position information. J. Bioinform. Comput. Biol. 2006;4:523-36.

79. Orton R.J., Wright C.F., Morelli M.J., King D.J., Paton D.J., King D.P., Haydon D.T. Distinguishing low frequency mutations from RT-PCR and sequence errors in viral deep sequencing data. BMC Genomics. 2015;16:229.

80. Otto C., Stadler P.F., Hoffmann S. Lacking alignments? The nextgeneration sequencing mapper segemehl revisited. Bioinformatics. 2014;30(13):1837-1843.

81. Pabinger S., Dander A., Fischer M., Snajder R., Sperk M., Efremova M., Krabichler B., Speicher M.R., Zschocke J., Trajanoski Z. A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform. 2014;15(2):256-278.

82. Park N., Shirley L., Gu Y., Keane T.M., Swerdlow H., Quail M.A. An improved approach to mate-paired library preparation for Illumina sequencing. Methods Next-Generation Sequencing. 2013;1(1): 10-20.

83. Patel R.K., Jain M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One. 2012;7(2):e30619.

84. Patro R., Kingsford C. Data-dependent bucketing improves referencefree compression of sequencing reads. Bioinformatics. 2015;31(17): 2770-2777.

85. Pightling A.W., Petronella N., Pagotto F. Choice of reference sequence and assembler for alignment of Listeria monocytogenes short-read sequence data greatly influences rates of error in SNP analyses. PLoS One. 2014;9(8):e104579.

86. Pireddu L., Leo S., Zanetti G. SEAL: a distributed short read mapping and duplicate removal tool. Bioinformatics. 2011;27(15):2159-2160.

87. Rieber N., Zapatka M., Lasitschka B., Jones D., Northcott P., Hutter B., Jäger N., Kool M., Taylor M., Lichter P., Pfister S., Wolf S., Brors B., Eils R. Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies. PLoS One. 2013; 8(6):e66621.

88. Ross M.G., Russ C., Costello M., Hollinger A., Lennon N.J., Hegarty R., Nusbaum C., Jaffe D.B. Characterizing and measuring bias in sequence data. Genome Biol. 2013;14(5):R51.

89. Safronova N.S., Babenko V.N., Orlov Y.L. 117 analysis of SNP containing sites in human genome using text complexity estimates. J. Biomol. Structure Dynamics. 2015;33(Supp1.):73-74. DOI 10.1080/07391102.2015.1032750.

90. Safronova N.S., Ponomarenko M.P., Abnizova I.I., Orlova G.V., Chadaeva I.V., Orlov Y.L. Flanking monomer repeats determine decreased context complexity of single nucleotide polymorphism sites in the human genome. Rus. J. Genet. Appl. Res. 2016;6(8):809-815. DOI 10.1134/S2079059716070121.

91. Sameith K., Roscito J.G., Hiller M. Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly. Brief Bioinform. 2016:1-8.

92. Sanger F., Nicklen S., Coulson A.R. DNA sequencing with chain-terminating inhibitors. (1977). Biotechnology. 1992;24:104-108.

93. Schadt E.E., Turner S., Kasarskis A. A window into third-generation sequencing. Hum. Mol. Genet. 2010;19(R2):R227-240.

94. Schmieder R., Edwards R. Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS One. 2011;6(3):e17288.

95. Schwartz S., Oren R., Ast G. Detection and removal of biases in the analysis of next- generation sequencing reads. PLoS One. 2011;6(1): e16685.

96. Shang J., Zhu F., Vongsangnak W., Tang Y., Zhang W., Shen B. Evaluation and comparison of multiple aligners for next-generation sequencing data analysis. Biomed. Res. Int. 2014:309650.

97. Sharon D., Tilgner H., Grubert F., Snyder M. A single-molecule longread survey of the human transcriptome. Nat. Biotechnol. 2013; 31(11):1009-1014.

98. Stitziel N.O., Kiezun A., Sunyaev S. Computational and statistical approaches to analyzing variants identified by exome sequencing. Genome Biol. 2011;12(9):227.

99. Tattini L., D’Aurizio R., Magi A. Detection of genomic structural variants from next- generation sequencing data. Front Bioeng. Biotechnol. 2015;3:92.

100. van Dijk E.L., Auger H., Jaszczyszyn Y., Thermes C. Ten years of next-generation sequencing technology. Trends Genet. 2014;30(9):418-426.

101. Vissers L.E., Fano V., Martinelli D., Campos-Xavier B., Barbuti D., Cho T.J., Dursun A., Kim O.H., Lee S.H., Timpani G., Nishimura G., Unger S., Sass J.O., Veltman J.A., Brunner H.G., Bonafe L., Dionisi-Vici S., Superti-Furga A. Whole-exome sequencing detects somatic mutations of IDH1 in metaphyseal chondromatosis with D-2-hydroxyglutaric aciduria (MC- HGA). Am. J. Med. Genet. A. 2011;155A(11):2609-2616.

102. Voskoboynik A., Neff N.F., Sahoo D., Newman A.M., Pushkarev D., Koh W., Passarelli B., Fan H.C., Mantalas G.L., Palmeri K.J., Ishizuka K.J., Gissi C., Griggio F., Ben-Shlomo R., Corey D.M., Penland L., White R.A.III, Weissman I.L., Quake S.R. The genome sequence of the colonial chordate, Botryllus schlosseri. ELife. 2013;2: e00569.

103. Walther A., Johnstone E., Swanton C., Midgley R., Tomlinson I., Kerr D. Genetic prognostic and predictive markers in colorectal cancer. Nat. Rev. Cancer. 2009;9(7):489-499.

104. Wang K., Li M., Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38(16):e164.

105. Wolfinger M.T., Fallmann J., Eggenhofer F., Amman F. ViennaNGS: A toolbox for building efficient next-generation sequencing analysis pipelines. F1000Res. 2015;4:50.

106. Wong K., Keane T.M., Stalker J., Adams D.J. Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly. Genome Biology. 2010;11(12).

107. Wyllie M. Comprehensive analysis of clinical trials data shows unequivocally that Phosphodiesterase Inhibitors (PDEi) improve orgasm. The power of meta-analysis? BJU Int. 2013;111(2): 190-191.

108. Yan X.J., Xu J., Gu Z.H., Pan C.M., Lu G., Shen Y., Shi J.Y., Zhu Y.M., Tang L., Zhang X.W., Liang W.-X., Mi J.-Q., Song H.-D., Li K.-Q., Chen Z., Chen S.-J. Exome sequencing identifies somatic mutations of DNA methyltransferase gene DNMT3A in acute monocytic leukemia. Nat. Genet. 2011;43(4):309-315.

109. Yang H., Wang K. Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nat. Protoc. 2015;10(10):1556-1566.

110. Yang X., Chockalingam S.P., Aluru S. A survey of error-correction methods for next- generation sequencing. Brief Bioinform. 2013; 14(1):56-66.

111. Ye L., Hillier L.W., Minx P., Thane N., Locke D.P., Martin J.C., Chen L., Mitreva M., Miller J.R., Haub K.V., Dooling D.J., Mardis E.R., Wilson R.K., Weinstock G.M., Warren W.C. A vertebrate case study of the quality of assemblies derived from next-generation sequences. Genome Biol. 2011;12(3):R31.

112. Zhang W., Ng H.W., Shu M., Luo H., Su Z., Ge W., Perkins R., Tong W., Hong H. Comparing genetic variants detected in the 1000 genomes project with SNPs determined by the International HapMap Consortium. J. Genetics. 2015;94(4):731-740.

Рецензия

Контент доступен под лицензией Creative Commons Attribution 4.0 License.

ISSN 2500-3259 (Online)

Логин
Пароль
	Запомнить меня
Регистрация нового пользователя Забыли Ваш пароль?

Войти

Вавиловский журнал генетики и селекции

Вычислительные проблемы анализа ошибок коротких прочтений ДНК при секвенировании следующего поколения

Полный текст:

Аннотация

Ключевые слова

Об авторах

Список литературы

Рецензия

Использование куки-файлов