Preview

Vavilov Journal of Genetics and Breeding

Advanced search

Computational problems of analysis of short next generation sequencing reads

https://doi.org/10.18699/VJ16.191

Abstract

Short read next generation sequencing (NGS) has significant impacts on modern genomics, genetics, cell biology and medicine, especially on meta-genomics, comparative genomics, polymorphism detection, mutation screening, transcriptome profiling, methylation profiling, chromatin remodelling and many more applications. However, NGS are prone for errors which complicate scientific conclusions. NGS technologies consist of shearing DNA molecules into collection of numerous small fragments, called a ‘library’, and their further extensive parallel sequencing. These sequenced overlapping fragments are called ‘reads’, they are assembled into contiguous strings. The contiguous sequences are in turn assembled into genomes for further analysis. Computational sequencing problems are those arising from numerical processing of sequenced samples. The numerical processing involves procedures such as: quality-scoring, mapping/assembling, and surprisingly, error-correction of a data. This paper is reviewing post-processing errors and computational methods to discern them. It also includes sequencing dictionary. We present here quality control of raw data, errors arising at the steps of alignment of sequencing reads to a reference genome and assembly. Finally this work presents identification of mutations (“Variant calling”) in sequencing data and its quality control.

About the Authors

R. te Boekhorst
University of Hertfordshire
Russian Federation
Hatfield, UK


F. M. Naumenko
Novosibirsk State University
Russian Federation
Novosibirsk, Russia


N. G. Orlova
Novosibirsk State University Novosibirsk State University of Architecture and Civil Engineering (Sibstrin)
Russian Federation
Novosibirsk, Russia


E. R. Galieva
Novosibirsk State University Institute of Cytology and Genetics SB RAS
Russian Federation
Novosibirsk, Russia


A. M. Spitsina
Novosibirsk State University
Russian Federation
Novosibirsk, Russia


I. V. Chadaeva
Novosibirsk State University
Russian Federation
Novosibirsk, Russia


Y. L. Orlov
Novosibirsk State University Institute of Cytology and Genetics SB RAS
Russian Federation
Novosibirsk, Russia


I. I. Abnizova
Wellcome Trust Sanger Institute
Russian Federation
Cambridge, UK


References

1. Abnizova I., Leonard S., Skelly T., Brown A., Jackson D., Gourtovaia M., Qi G., Te Boekhorst R., Faruque N., Lewis K., Cox T. Analysis of context-dependent errors for illumina sequencing. J. Bioinform. Comput. Biol. 2012;10(2):1241005.

2. Abnizova I., Skelly T., Naumenko F., Whiteford N., Brown C., Cox T. Statistical comparison of methods to estimate the error probability in short-read Illumina sequencing. J. Bioinform. Comput. Biol. 2010;8(3):579-591.

3. Albers C.A., Lunter G., MacArthur D.G., McVean G., Ouwehand W.H., Durbin R. Dindel: accurate indel calls from short-read data. Genome Res. 2011;21(6):961-973.

4. Alkan C., Cardone M.F., Catacchio C.R., Antonacci F., O’Brien S.J., Ryder O.A., Purgato S., Zoli M., Della Valle G., Eichler E.E., Ventura M. Genome-wide characterization of centromeric satellites from multiple mammalian genomes. Genome Res. 2011a;21(1):137-145.

5. Alkan C., Sajjadian S., Eichler E.E. Limitations of next-generation genome sequence assembly. Nat. Methods. 2011b;8(1):61-65.

6. Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215(3):403-410.

7. Anders S., Pyl P.T., Huber W. HTSeq – a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31(2): 166-169.

8. Auerbach R.K., Euskirchen G., Rozowsky J., Lamarre-Vincent N., Moqtaderi Z., Lefrancois P., Struhl K., Gerstein M., Snyder M. Mapping accessible chromatin regions using Sono-Seq. Proc. Natl. Acad. Sci. USA. 2009;106(35):14926-14931.

9. Baker M. De novo genome assembly: what every biologist should know. Nature Methods. 2012;9.

10. Balint B. Decreased sequencing accuracy at the 3′ end of SBS Illumina Reads. 2016.

11. Benjamini Y., Speed T.P. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 2012; 40(10):e72.

12. Bonfield J.K., Staden R. The application of numerical estimates of base calling accuracy to DNA sequencing projects. Nucleic Acids Res. 1995;23(8):1406-1410.

13. Bragg L.M., Stone G., Butler M.K., Hugenholtz P., Tyson G.W. Shining a light on dark sequencing: characterising errors in Ion Torrent PGM data. PLoS Comput. Biol. 2013;9(4):e1003031.

14. Brockman W., Alvarez P., Young S., Garber M., Giannoukos G., Lee W.L., Russ C., Lander E.S., Nusbaum C., Jaffe D.B. Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Res. 2008;18(5):763-770.

15. Chen Y.C., Liu T., Yu C.H., Chiang T.Y., Hwang C.C. Effects of GC bias in next-generation- sequencing data on de novo genome assembly. PLoS One. 2013; 8(4):e62856.

16. Chin E.L.H., da Silva C., Hegde M. Assessment of clinical analytical sensitivity and specificity of next-generation sequencing for detection of simple and complex mutations. BMC Genet. 2013;14:6.

17. Chin F.Y.L., Leung H.C.M., Yiu S.M. Sequence assembly using next generation sequencing data – challenges and solutions. Sci. China Life Sciences. 2014;57(11):1140-1148.

18. Cox M.P., Peterson D.A., Biggs P.J. SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics. 2010;11:485.

19. Davis M.P., van Dongen S., Abreu-Goodger C., Bartonicek N., Enright A.J. Kraken: a set of tools for quality control and analysis of high-throughput sequence data. Methods. 2013;63(1):41-49.

20. Day-Williams A.G., Zeggini E. The effect of next-generation sequencing technology on complex trait research. Eur. J. Clin. Invest. 2011; 41(5):561-567.

21. de Koning A.P., Gu W., Castoe T.A., Batzer M.A., Pollock D.D. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 2011;7(12):e1002384.

22. Del Fabbro C., Scalabrin S., Morgante M., Giorgi F.M. An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS One. 2013;8(12):e85024.

23. DePristo M.A., Banks E., Poplin R., Garimella K.V., Maguire J.R., Hartl C., Philippakis A.A., del Angel G., Rivas M.A., Hanna M., McKenna A., Fennel T.J., Kernytsky A.M., Sivachenko A.Y., Cibulskis K., Gabriel S.B., Altshuler D., Daly M.J. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 2011;43(5):491-498.

24. Dolled-Filhart M.P., Lee M., Jr., Ou-Yang C.W., Haraksingh R.R., Lin J.C. Computational and bioinformatics frameworks for nextgeneration whole exome and genome sequencing. Sci. World J. 2013:730210.

25. Edgar R.C., Flyvbjerg H. Error filtering, pair assembly and error correction for next- generation sequencing reads. Bioinformatics. 2015; 31(21):3476-3482.

26. Ewing B., Hillier L., Wendl M.C., Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998;8(3):175-185.

27. Feschotte C., Jiang N., Wessler S.R. Plant transposable elements: where genetics meets genomics. Nat. Rev. Genet. 2002;3(5):329-341.

28. Fonseca N.A., Rung J., Brazma A., Marioni J.C. Tools for mapping highthroughput sequencing data. Bioinformatics. 2012;28(24):3169-3177.

29. Fujimoto M., Bodily P.M., Okuda N., Clement M.J., Snell Q. Effects of error-correction of heterozygous next-generation sequencing data. BMC Bioinformatics. 2014;15(Suppl.7):S3.

30. Garcia-Garcia G., Baux D., Faugere V., Moclyn M., Koenig M., Claustres M., Roux A.F. Assessment of the latest NGS enrichment capture methods in clinical context. Sci. Rep. 2016;6:20948.

31. Guo Y., Ye F., Sheng Q., Clark T., Samuels D.C. Three-stage quality control strategies for DNA re-sequencing data. Brief Bioinform. 2014a;15(6):879-889.

32. Guo Y., Zhao S., Sheng Q., Ye F., Li J., Lehmann B., Pietenpol J., Samuels D.C., Shyr Y. Multi- perspective quality control of Illumina exome sequencing data using QC3. Genomics. 2014b;103(5-6):323-328.

33. Hadfield J. Quality control for your NGS data. 2013.

34. Harismendy O., Ng P.C., Strausberg R.L., Wang X., Stockwell T.B., Beeson K.Y., Schork N.J., Murray S.S., Topol E.J., Levy S., Frazer K.A. Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol. 2009;10(3):R32.

35. Huang Y.F., Chen S.C., Chiang Y.S., Chen T.H., Chiu K.P. Palindromic sequence impedes sequencing-by-ligation mechanism. BMC Syst. Biol. 2012;6(Suppl.2):S10.

36. Huddleston J., Ranade S., Malig M., Antonacci F., Chaisson M., Hon L., Sudmant P.H., Graves T.A., Alkan C., Dennis M.Y., Wilson R.K., Turner S.W., Korlach J., Eichler E.E. Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res. 2014;24(4):688-696.

37. IDT. 2011. G-quenching.

38. Ignatieva E.V., Podkolodnaya O.A., Orlov Y.L., Vasiliev G.V., Kolchanov N.A. Regulatory genomics: Integrated experimental and computer approaches. Genetika. 2015;51(4):409- 429.

39. Illumina. 2014. Sequencing Library QC on the MiSeq® System. Janin L., Schulz-Trieglaff O., Cox A.J. BEETL-fastq: a searchable compressed archive for DNA reads. Bioinformatics. 2014;30(19):2796-2801.

40. Jiang H., Lei R., Ding S.W., Zhu S. Skewer: a fast and accurate adapter trimmer for next- generation sequencing paired-end reads. BMC Bioinformatics. 2014;15:182.

41. Jun G., Wing M.K., Abecasis G.R., Kang H.M. An efficient and scalable analysis framework for variant extraction and refinement from population- scale DNA sequence data. Genome Res. 2015;25(6):918-925.

42. Kelley D.R., Schatz M.C., Salzberg S.L. Quake: quality-aware detection and correction of sequencing errors. Genome Biol. 2010;11(11):R116.

43. Kojima K., Nariai N., Mimori T., Takahashi M., Yamaguchi-Kabata Y., Sato Y., Nagasaki M. A statistical variant calling approach from pedigree information and local haplotyping with phase informative reads. Bioinformatics. 2013;29(22):2835-2843.

44. Lander E.S. Initial impact of the sequencing of the human genome. Nature. 2011;470(7333):187-197.

45. Lawrence M. Introduction to variant calling. Bioconductor. 2014.

46. Ledergerber C., Dessimoz C. Base-calling for next-generation sequencing platforms. Brief Bioinform. 2011;12(5):489-497.

47. Lettice L.A., Hill A.E., Devenney P.S., Hill R.E. Point mutations in a distant sonic hedgehog cis-regulator generate a variable regulatory output responsible for preaxial polydactyly. Hum. Mol. Genet. 2008;17(7):978-985.

48. Li H., Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26(5):589-595.

49. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. Genome Project Data Processing S. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078-2079.

50. Li H., Homer N. A survey of sequence alignment algorithms for nextgeneration sequencing. Brief Bioinform. 2010;11(5):473-483.

51. Li J.W., Robison K., Martin M., Sjodin A., Usadel B., Young M., Olivares E.C., Bolser D.M. The SEQanswers wiki: a wiki database of tools for high-throughput sequencing analysis. Nucleic Acids Res. 2012a;40(Database iss.):D1313-1317.

52. Li J.W., Schmieder R., Ward R.M., Delenick J., Olivares E.C., Mittelman D. SEQanswers: an open access community for collaboratively decoding genomes. Bioinformatics. 2012b;28(9):1272-1273.

53. Li M.K., Stoneking M. A new approach for detecting low-level mutations in next-generation sequence data. Genome Biology. 2012;13(5).

54. Li R., Yu C., Li Y., Lam T.W., Yiu S.M., Kristiansen K., Wang J. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009;25(15):1966-1967.

55. Li S., Li R., Li H., Lu J., Li Y., Bolund L., Schierup M.H., Wang J. SOAPindel: efficient identification of indels from short paired reads. Genome Res. 2013;23(1):195-200.

56. Li Z., Chen Y., Mu D., Yuan J., Shi Y., Zhang H., Gan J., Li N., Hu X., Liu B., Yang B., Fan W. Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph. Brief Funct. Genomics. 2012;11(1):25-37.

57. Lin K., Smit S., Bonnema G., Sanchez-Perez G., de Ridder D. Making the difference: integrating structural variation detection tools. Brief Bioinform. 2015;16(5):852-864.

58. Liu L., Li Y., Li S., Hu N., He Y., Pong R., Lin D., Lu L., Law M. Comparison of next-generation sequencing systems. J. Biomed. Biotechnol. 2012;251364.

59. Lo C.C., Chain P.S. Rapid evaluation and quality control of next generation sequencing data with FaQCs. BMC Bioinformatics. 2014;15:366.

60. Magoc T., Pabinger S., Canzar S., Liu X., Su Q., Puiu D., Tallon L.J., Salzberg S.L. GAGE-B: an evaluation of genome assemblers for bacterial organisms. Bioinformatics. 2013;29(14):1718-1725.

61. Makinen V., Salmela L., Ylinen J. Normalized N50 assembly metric using gap-restricted co- linear chaining. BMC Bioinformatics. 2012; 13:255.

62. Marian A.J. Molecular genetic studies of complex phenotypes. Transl. Res. 2012;159(2):64- 79.

63. Marroni F., Pinosio S., Morgante M. The quest for rare variants: pooled multiplexed next generation sequencing in plants. Front Plant. Sci. 2012;3:33.

64. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.Journal. 2011;17(1).

65. Massingham T., Goldman N. All your Base: a fast and accurate probabilistic approach to base calling. Genome Biology. 2012;13(2).

66. McCoy R.C., Taylor R.W., Blauwkamp T.A., Kelley J.L., Kertesz M., Pushkarev D., Petrov D.A., Fiston-Lavier A.S. Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly- repetitive transposable elements. PLoS One. 2014;9(9):e106689.

67. Medvedev P., Stanciu M., Brudno M. Computational methods for discovering structural variation with next-generation sequencing. Nat. Methods. 2009;6(Suppl.11):S13-20.

68. Miller J.R., Koren S., Sutton G. Assembly algorithms for next-generation sequencing data. Genomics. 2010;95(6):315-327.

69. Minoche A.E., Dohm J.C., Himmelbauer H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 2011;12(11):R112.

70. Mir K., Neuhaus K., Bossert M., Schober S. Short barcodes for next generation sequencing. PLoS One. 2013;8(12):e82933.

71. Mutarelli M., Marwah V., Rispoli R., Carrella D., Dharmalingam G., Oliva G., di Bernardo D. A community-based resource for automatic exome variant-calling and annotation in Mendelian disorders. BMC Genomics. 2014;15(Suppl.3):S5.

72. Nagarajan N., Pop M. Sequence assembly demystified. Nat. Rev. Genet. 2013;14(3):157-167.

73. Nakamura K., Oshima T., Morimoto T., Ikeda S., Yoshikawa H., Shiwa Y., Ishikawa S., Linak M.C., Hirai A., Takahashi H., Altaf-Ul-Amin Md., Ogasawara N., Kanaya S. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 2011;39(13):e90.

74. Newell F. NGS mapping, errors and quality control. Australia: Univ. of Queensland, 2014.

75. Nielsen C.B., Cantor M., Dubchak I., Gordon D., Wang T. Visualizing genomes: techniques and challenges. Nat. Methods. 2010; 7(Suppl.3):S5-S15.

76. Niu B., Fu L., Sun S., Li W. Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinformatics. 2010; 11:187.

77. Olson N.D., Lund S.P., Colman R.E., Foster J.T., Sahl J.W., Schupp J.M., Keim P., Morrow J.B., Salit M.L., Zook J.M. Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Front. Genet. 2015;6:235.

78. Orlov Y.L., Te Boekhorst R., Abnizova I.I. Statistical measures of the structure of genomic sequences: entropy, complexity, and position information. J. Bioinform. Comput. Biol. 2006;4:523-36.

79. Orton R.J., Wright C.F., Morelli M.J., King D.J., Paton D.J., King D.P., Haydon D.T. Distinguishing low frequency mutations from RT-PCR and sequence errors in viral deep sequencing data. BMC Genomics. 2015;16:229.

80. Otto C., Stadler P.F., Hoffmann S. Lacking alignments? The nextgeneration sequencing mapper segemehl revisited. Bioinformatics. 2014;30(13):1837-1843.

81. Pabinger S., Dander A., Fischer M., Snajder R., Sperk M., Efremova M., Krabichler B., Speicher M.R., Zschocke J., Trajanoski Z. A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform. 2014;15(2):256-278.

82. Park N., Shirley L., Gu Y., Keane T.M., Swerdlow H., Quail M.A. An improved approach to mate-paired library preparation for Illumina sequencing. Methods Next-Generation Sequencing. 2013;1(1): 10-20.

83. Patel R.K., Jain M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One. 2012;7(2):e30619.

84. Patro R., Kingsford C. Data-dependent bucketing improves referencefree compression of sequencing reads. Bioinformatics. 2015;31(17): 2770-2777.

85. Pightling A.W., Petronella N., Pagotto F. Choice of reference sequence and assembler for alignment of Listeria monocytogenes short-read sequence data greatly influences rates of error in SNP analyses. PLoS One. 2014;9(8):e104579.

86. Pireddu L., Leo S., Zanetti G. SEAL: a distributed short read mapping and duplicate removal tool. Bioinformatics. 2011;27(15):2159-2160.

87. Rieber N., Zapatka M., Lasitschka B., Jones D., Northcott P., Hutter B., Jäger N., Kool M., Taylor M., Lichter P., Pfister S., Wolf S., Brors B., Eils R. Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies. PLoS One. 2013; 8(6):e66621.

88. Ross M.G., Russ C., Costello M., Hollinger A., Lennon N.J., Hegarty R., Nusbaum C., Jaffe D.B. Characterizing and measuring bias in sequence data. Genome Biol. 2013;14(5):R51.

89. Safronova N.S., Babenko V.N., Orlov Y.L. 117 analysis of SNP containing sites in human genome using text complexity estimates. J. Biomol. Structure Dynamics. 2015;33(Supp1.):73-74. DOI 10.1080/07391102.2015.1032750.

90. Safronova N.S., Ponomarenko M.P., Abnizova I.I., Orlova G.V., Chadaeva I.V., Orlov Y.L. Flanking monomer repeats determine decreased context complexity of single nucleotide polymorphism sites in the human genome. Rus. J. Genet. Appl. Res. 2016;6(8):809-815. DOI 10.1134/S2079059716070121.

91. Sameith K., Roscito J.G., Hiller M. Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly. Brief Bioinform. 2016:1-8.

92. Sanger F., Nicklen S., Coulson A.R. DNA sequencing with chain-terminating inhibitors. (1977). Biotechnology. 1992;24:104-108.

93. Schadt E.E., Turner S., Kasarskis A. A window into third-generation sequencing. Hum. Mol. Genet. 2010;19(R2):R227-240.

94. Schmieder R., Edwards R. Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS One. 2011;6(3):e17288.

95. Schwartz S., Oren R., Ast G. Detection and removal of biases in the analysis of next- generation sequencing reads. PLoS One. 2011;6(1): e16685.

96. Shang J., Zhu F., Vongsangnak W., Tang Y., Zhang W., Shen B. Evaluation and comparison of multiple aligners for next-generation sequencing data analysis. Biomed. Res. Int. 2014:309650.

97. Sharon D., Tilgner H., Grubert F., Snyder M. A single-molecule longread survey of the human transcriptome. Nat. Biotechnol. 2013; 31(11):1009-1014.

98. Stitziel N.O., Kiezun A., Sunyaev S. Computational and statistical approaches to analyzing variants identified by exome sequencing. Genome Biol. 2011;12(9):227.

99. Tattini L., D’Aurizio R., Magi A. Detection of genomic structural variants from next- generation sequencing data. Front Bioeng. Biotechnol. 2015;3:92.

100. van Dijk E.L., Auger H., Jaszczyszyn Y., Thermes C. Ten years of next-generation sequencing technology. Trends Genet. 2014;30(9):418-426.

101. Vissers L.E., Fano V., Martinelli D., Campos-Xavier B., Barbuti D., Cho T.J., Dursun A., Kim O.H., Lee S.H., Timpani G., Nishimura G., Unger S., Sass J.O., Veltman J.A., Brunner H.G., Bonafe L., Dionisi-Vici S., Superti-Furga A. Whole-exome sequencing detects somatic mutations of IDH1 in metaphyseal chondromatosis with D-2-hydroxyglutaric aciduria (MC- HGA). Am. J. Med. Genet. A. 2011;155A(11):2609-2616.

102. Voskoboynik A., Neff N.F., Sahoo D., Newman A.M., Pushkarev D., Koh W., Passarelli B., Fan H.C., Mantalas G.L., Palmeri K.J., Ishizuka K.J., Gissi C., Griggio F., Ben-Shlomo R., Corey D.M., Penland L., White R.A.III, Weissman I.L., Quake S.R. The genome sequence of the colonial chordate, Botryllus schlosseri. ELife. 2013;2: e00569.

103. Walther A., Johnstone E., Swanton C., Midgley R., Tomlinson I., Kerr D. Genetic prognostic and predictive markers in colorectal cancer. Nat. Rev. Cancer. 2009;9(7):489-499.

104. Wang K., Li M., Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38(16):e164.

105. Wolfinger M.T., Fallmann J., Eggenhofer F., Amman F. ViennaNGS: A toolbox for building efficient next-generation sequencing analysis pipelines. F1000Res. 2015;4:50.

106. Wong K., Keane T.M., Stalker J., Adams D.J. Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly. Genome Biology. 2010;11(12).

107. Wyllie M. Comprehensive analysis of clinical trials data shows unequivocally that Phosphodiesterase Inhibitors (PDEi) improve orgasm. The power of meta-analysis? BJU Int. 2013;111(2): 190-191.

108. Yan X.J., Xu J., Gu Z.H., Pan C.M., Lu G., Shen Y., Shi J.Y., Zhu Y.M., Tang L., Zhang X.W., Liang W.-X., Mi J.-Q., Song H.-D., Li K.-Q., Chen Z., Chen S.-J. Exome sequencing identifies somatic mutations of DNA methyltransferase gene DNMT3A in acute monocytic leukemia. Nat. Genet. 2011;43(4):309-315.

109. Yang H., Wang K. Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nat. Protoc. 2015;10(10):1556-1566.

110. Yang X., Chockalingam S.P., Aluru S. A survey of error-correction methods for next- generation sequencing. Brief Bioinform. 2013; 14(1):56-66.

111. Ye L., Hillier L.W., Minx P., Thane N., Locke D.P., Martin J.C., Chen L., Mitreva M., Miller J.R., Haub K.V., Dooling D.J., Mardis E.R., Wilson R.K., Weinstock G.M., Warren W.C. A vertebrate case study of the quality of assemblies derived from next-generation sequences. Genome Biol. 2011;12(3):R31.

112. Zhang W., Ng H.W., Shu M., Luo H., Su Z., Ge W., Perkins R., Tong W., Hong H. Comparing genetic variants detected in the 1000 genomes project with SNPs determined by the International HapMap Consortium. J. Genetics. 2015;94(4):731-740.


Review

Views: 1403


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2500-3259 (Online)