Preview

Vavilov Journal of Genetics and Breeding

Advanced search

The design of experiments for the transcriptome studies by high-throughput sequencing methods

https://doi.org/10.18699/VJ16.148

Abstract

The common questions in the design of the highthroughput sequencing experiments using RNA-Seq or Ribo-Seq methods are reviewed. The ENCODE guidelines (2011) as well as the recently published advances in the design of the studies of mammalian, animal and plant transcriptomes are also summarized in this review. The optimal limit of the sequencing depth does exist for the identification of almost all actively transcribed genes. This limit depends on the transcriptome size in the biological object studied. Additional sequencing does not provide any substantial additional information about the transcriptome complexity. For mammals, the optimal limit of the sequencing depth for the identification of the actively transcribed genes is equal to ~ 2 × 109 bp per biological sample. For other species, the optimal limit of the sequencing depth per biological sample is determined similarly for mammals; however, the transcriptome size and the mean RNA content in the studied object should be taken into account, in comparison to the mammalian transcriptomes. The discovery of differentially expressed genes, as well as the identification of splicing sites in the mRNA could be enhanced by increasing the number of biological samples analyzed per each experimental group. The minimal number of biological replicates per experimental group is equal to 2. However, the optimal number of biological replicates per experimental group is equal to 5–8 (similar to the experiments quantifying the expression of single genes by qRT-PCR). For the transcriptome studies, it is recommended to use the sequencing technologies that have the accuracy of sequencing ≥ 0.999 per bp. For RNASeq, it is also recommended to use the technologies that are able to produce reads equal to or larger than 75 bp, to minimize the cost of the effective identification of the sequences. The relative cost for the sequencing of the control samples could be reduced by increasing the number of experimental groups in the experiment or by combining several independent experiments with similar control groups. The present notes could be utilized during the design step in the experimental studies devoted to the research of transcriptomes.

About the Authors

P. N. Menshanov
Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia
Russian Federation


N. N. Dygalo
Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia
Russian Federation


References

1. Ansorge W.J. Next-generation DNA sequencing techniques. Nаt. Biotechnol. 2009;25(4):195-203. DOI 10.1016/j.nbt.2008.12.009

2. Arner E., Beckhouse A., Briggs J., Ovchinnikov D., Wolvetang E., Wells C. and FANTOM Consortium. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science. 2015;347(6225):1010-1014. DOI 10.1126/science.1259418

3. Aspden J.L., Eyre-Walker Y.C., Phillips R.J., Amin U., Mumtaz M.A., Brocard M., Couso J.P. Extensive translation of small open reading frames revealed by Poly-Ribo- Seq. Elife. 2014; 3:e03528. DOI 10.7554/eLife.03528

4. Bennett N.C., Farah C.S. Next-generation sequencing in clinical oncology: Next Steps Towards Clinical Validation. Cancers (Basel). 2014;6(4):2296-2312. DOI 10.3390/cancers6042296

5. Ching T., Huang S., Garmire L.X. Power analysis and sample size estimation for RNA-Seq differential expression. RNA. 2014;20(11):1684-1696. DOI 10.1261/rna.046011.114

6. Coate J.E., Doyle J.J. Quantifying whole transcriptome size, a prerequisite for understanding transcriptome evolution across species: an example from a plant allopolyploid. Genome Biol. Evol. 2010;2:534-546. DOI 10.1093/gbe/evq038

7. Coate J.E., Doyle J.J. Variation in transcriptome size: are we getting the message? Chromosoma. 2015;124(1):27-43. DOI 10.1007/s00412-014-0496-3

8. Corney D.C. RNA-seq using next generation sequencing. Mater. Methods. 2013;3:203. DOI 10.13070/mm.en.3.203

9. Florea L.D., Salzberg S.L. Genome-guided transcriptome assembly in the age of next-generation sequencing. IEEE/ACM Trans. Comput. Biol. Bioinform. 2013;10(5):1234- 1240.

10. Flusser G., Ginzburg V., Meyuhas O. Glucocorticoids induce transcription of ribosomal protein genes in rat liver. Mol. Cell. Endocrinol. 1989;64(2):213-222. DOI 10.1016/0303-7207(89)90148-2

11. Fox E.J., Reid-Bayliss K.S., Emond M.J., Loeb L.A. Accuracy of next generation sequencing platforms. Next Gener. Seq. Appl. 2014;1:1000106. DOI 10.4172/jngsa.1000106

12. Galau G.A., Klein W.H., Britten R.J., Davidson E.H. Significance of rare mRNA sequences in liver. Arch. Biochem. Biophys. 1977;179(2):584-599. DOI 10.1016/0003- 9861(77)90147-3

13. Gargis A.S., Kalman L., Bick D.P., da Silva C., Dimmock D.P., Funke B.H., Gowrisankar S., Hegde M.R., Kulkarni S., Mason C.E., Nagarajan R., Voelkerding K.V., Worthey E.A., Aziz N., Barnes J., Bennett S.F., Bisht H., Church D.M., Dimitrova Z., Gargis S.R., Hafez N., Hambuch T., Hyland F.C., Luna R.A., MacCannell D., Mann T., McCluskey M.R., McDaniel T.K., Ganova-Raeva L.M., Rehm H.L., Reid J., Campo D.S., Resnick R.B., Ridge P.G., Salit M.L., Skums P., Wong L.J., Zehnbauer B.A., Zook J.M., Lubin I.M. Good laboratory practice for clinical next-generation sequencing informatics pipelines. Nat. Biotechnol. 2015;33(7):689-693. DOI 10.1038/nbt.3237

14. Ghosh S., Chan C.K. Analysis of RNA-Seq Data using TopHat and Cufflinks. Method. Mol. Biol. 2016;1374:339-361. DOI 10.1007/978-1-4939-3167-5_18

15. Hart T., Komori H.K., LaMere S., Podshivalova K., Salomon D.R. Finding the active genes in deep RNA-seq gene expression studies. BMC Genomics. 2013;14:778. DOI 10.1186/1471-2164-14-778

16. Hart S.N., Therneau T.M., Zhang Y., Poland G.A., Kocher J.P. Calculating sample size estimates for RNA sequencing data. J. Comput. Biol. 2013;20(12):970-978. DOI 10.1089/cmb.2012.0283

17. Hawrylycz M.J., Lein E.S., Guillozet-Bongaarts A.L., Shen E.H., Ng L., Miller J.A., van de Lagemaat L.N., Smith K.A., Ebbert A., Riley Z.L., Abajian C., Beckmann C.F., Bernard A., Bertagnolli D., Boe A.F., Cartagena P.M., Chakravarty M.M., Chapin M., Chong J., Dalley R.A., Daly B.D., Dang C., Datta S., Dee N., Dolbeare T.A., Faber V., Feng D., Fowler D.R., Goldy J., Gregor B.W., Haradon Z., Haynor D.R., Hohmann J.G., Horvath S., Howard R.E., Jeromin A., Jochim J.M., Kinnunen M., Lau C., Lazarz E.T., Lee C., Lemon T.A., Li L., Li Y., Morris J.A., Overly C.C., Parker P.D., Parry S.E., Reding M., Royall J.J., Schulkin J., Sequeira P.A., Slaughterbeck C.R., Smith S.C., Sodt A.J., Sunkin S.M., Swanson B.E., Vawter M.P., Williams D., Wohnoutka P., Zielke H.R., Geschwind D.H., Hof P.R., Smith S.M., Koch C., Grant S.G., Jones A.R. An anatoNumbermically comprehensive atlas of the adult human brain transcriptome. Nature. 2012;489(7416):391-399. DOI 10.1038/nature11405

18. Hebenstreit D., Fang M., Gu M., Charoensawan V., van Oudenaarden A., Teichmann S.A. RNA sequencing reveals two major classes of gene expression levels in metazoan cells. Mol. Syst. Biol. 2011;7: 497. DOI 10.1038/msb.2011.28

19. Hughes M.E., Grant G.R., Paquin C., Qian J., Nitabach M.N. Deep sequencing the circadian and diurnal transcriptome of Drosophila brain. Genome Res. 2012;22(7):1266-1281. DOI 10.1101/gr.128876.111

20. Ingolia N.T. Ribosome profiling: new views of translation, from single codons to genome scale. Nat. Rev. Genet. 2014;15(3):205-213. DOI 10.1038/nrg3645

21. Kagale S., Koh C., Nixon J., Bollina V., Clarke W.E., Tuteja R., Spillane C., Robinson S.J., Links M.G., Clarke C., Higgins E.E., Huebert T., Sharpe A.G., Parkin I.A. The emerging biofuel crop Camelina sativa retains a highly undifferentiated hexaploid genome structure. Nat. Commun. 2014;5:3706. DOI 10.1038/ncomms4706

22. Karlen Y., McNair A., Perseguers S., Mazza C., Mermod N. Statistical significance of quantitative PCR. BMC Bioinformatics. 2007;8:131. DOI 10.1186/1471-2105-8-131

23. Kellis M., Wold B., Snyder M.P., Bernstein B.E., Kundaje A., Marinov G.K., Ward L.D., Birney E., Crawford G.E., Dekker J., Dunham I., Elnitski L.L., Farnham P.J., Feingold E.A., Gerstein M., Giddings M.C., Gilbert D.M., Gingeras T.R., Green E.D., Guigo R., Hubbard T., Kent J., Lieb J.D., Myers R.M., Pazin M.J., Ren B., Stamatoyannopoulos J.A., Weng Z., White K.P., Hardison R.C. Defining functional DNA elements in the human genome. Proc. Natl. Acad. Sci. USA. 2014;111(17):6131-6138. DOI 10.1073/pnas.1318948111

24. Kozhevnikova O.S., Korbolina E.E., Ershov N.I., Kolosova N.G. Rat retinal transcriptome: effects of aging and AMD-like retinopathy. Cell Cycle. 2013;12(11):1745-1761. DOI 10.4161/cc.24825

25. Krasileva K.V., Buffalo V., Bailey P., Pearce S., Ayling S., Tabbita F., Soria M., Wang S., IWGS Consortium, Akhunov E., Uauy C., Dubcovsky J. Separating homeologs by phasing in the tetraploid wheat transcriptome. Genome Biol. 2013;14(6):R66. DOI 10.1186/gb-2013-14-6-r66

26. Liu Y., Zhou J., White K.P. RNA-seq differential expression studies: more sequence or more replication? Bioinformatics. 2014;30(3): 301-304. DOI 10.1093/bioinformatics/btt688

27. Mardis E.R. The impact of next-generation sequencing technology on genetics. Trends Genet. 2008;24(3):133-141. DOI 10.1016/j.tig.2007.12.007

28. Marguerat S., Bähler J. Coordinating genome expression with cell size. Trends Genet. 2012;28(11):560-565. DOI 10.1016/j.tig.2012.07.003

29. Marinov G.K., Williams B.A., McCue K., Schroth G.P., Gertz J., Myers R.M., Wold B.J. From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing. Genome Res. 2014;24(3):496-510. DOI 10.1101/gr.161034.113

30. Martin J.A., Wang Z. Next-generation transcriptome assembly. Nat. Rev. Genet. 2011;12(10):671-682. DOI 10.1038/nrg3068

31. McManus C.J., Coolon J.D., Duff M.O., Eipper-Mains J., Graveley B.R., Wittkopp P.J. Regulatory divergence in Drosophila revealed by mRNA-seq. Genome Res. 2010;20(6):816-825. DOI 10.1101gr.102491.109

32. Menshanov P.N., Dygalo N.N. Methodological aspects of read mapping and assembly of transcriptomes derived from the brain tissue samples of Rattus norvegicus. Rus. J. Genet: Appl. Res. 2015;5(4):401-406. DOI 10.1134/S2079059715040097

33. Mortazavi A., Williams B.A., McCue K., Schaeffer L., Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008;5(7):621-628. DOI 10.1038/nmeth.1226

34. Moskalev A., Zhikrivetskaya S., Krasnov G., Shaposhnikov M., Proshkina E., Borisoglebsky D., Danilov A., Peregudova D., Sharapova I., Dobrovolskaya E., Solovev I., Zemskaya N., Shilova L., Snezhkina A., Kudryavtseva A. A comparison of the transcriptome of Drosophila melanogaster in response to entomopathogenic fungus, ionizing radiation, starvation and cold shock. BMC Genomics. 2015;16(Suppl. 13):S8. DOI 10.1186/1471-2164-16-S13-S8

35. Mutz K.O., Heilkenbrinker A., Lönne M., Walter J.G., Stahl F. Transcriptome analysis using next-generation sequencing. Curr. Opin. Biotechnol. 2013;24(1):22-30. DOI 10.1016/j.copbio.2012.09.004

36. Nfonsam L.E., Cano C., Mudge J., Schilkey F.D., Curtiss J. Analysis of the transcriptomes downstream of Eyeless and the Hedgehog, Decapentaplegic and Notch signaling pathways in Drosophila melanogaster. PLoS One. 2012;7(8):e44583. DOI 10.1371/journal.pone.0044583

37. Nonis A., De Nardi B., Nonis A. Choosing between RT-qPCR and RNA-seq: a back-of-the- envelope estimate towards the definition of the break-even-point. Anal. Bioanal. Chem. 2014;406(15):3533-3536. DOI 10.1007/s00216-014-7687-x

38. O’Rourke J.A., Iniguez L.P., Fu F., Bucciarelli B., Miller S.S., Jackson S.A., McClean P.E., Li J., Dai X., Zhao P.X., Hernandez G., Vance C.P. An RNA-Seq based gene expression atlas of the common bean. BMC Genomics. 2014;15:866. DOI 10.1186/1471-2164-15-866

39. Pembroke W.G., Babbs A., Davies K., Ponting C.P., Oliver P.L. Temporal transcriptomics suggest that twin-peaking genes reset the clock. Elife. 2015;4.pii:e10518. DOI 10.7554/eLife.10518

40. Reef R., Ball M.C., Feller I.C., Lovelock C.E. Relationships among RNA:DNA ratio, growth and elemental stoichiometry in mangrove trees. Funct. Ecol. 2010;24(5):1064- 1072. DOI 10.1111/j.1365-2435.2010.01722.x

41. Schmidt E.E., Schibler U. Cell size regulation, a mechanism that controls cellular RNA accumulation: consequences on regulation of the ubiquitous transcription factors Oct1 and NF-Y and the liver-enriched transcription factor DBP. J. Cell Biol. 1995;128(4):467-483.

42. Shishkina G.T., Kalinina T.S., Bulygina V.V., Lanshakov D.A., Babluk E.V., Dygalo N.N. Anti-apoptotic protein Bcl-xL expression in the midbrain raphe region Is sensitive to stress and glucocorticoids. PLoS One. 2015;10(12):e0143978. DOI 10.1371/journal.pone.0143978

43. Sims D., Sudbery I., Ilott N.E., Heger A., Ponting C.P. Sequencing depth and coverage: key considerations in genomic analyses. Nat. Rev. Genet. 2014;15(2):121-132. DOI 10.1038/nrg3642

44. Spies D., Ciaudo C. Dynamics in transcriptomics: advancements in RNA-seq time course and downstream analysis. Comput. Struct. Biotechnol. J. 2015;13:469-477. DOI 10.1016/j.csbj.2015.08.004

45. Tarazona S., García-Alcalde F., Dopazo J., Ferrer A., Conesa A. Differential expression in RNA-seq: a matter of depth. Genome Res. 2011;21(12):2213-2223. DOI 10.1101/gr.124321.111

46. The ENCODE Consortium. Standards, Guidelines and Best Practices for RNA-Seq V1.0. 1.0. 6-1-2011.

47. Thompson W.L., Abeles F.B., Beall F.A., Dinterman R.E., Wannemacher R.W. Jr. Influence of the adrenal glucocorticoids on the stimulation of synthesis of hepatic ribonucleic acid and plasma acute-phase globulins by leucocytic endogenous mediator. Biochem. J. 1976;156(1):25-32.

48. van Bakel H., Nislow C., Blencowe B.J., Hughes T.R. Response to “The reality of pervasive transcription”. PLoS Biol. 2011;9(7):e1001102. DOI 10.1371/journal.pbio.1001102

49. Veeneman B.A., Shukla S., Dhanasekaran S.M., Chinnaiyan A.M., Nesvizhskii A.I. Two- pass alignment improves novel splice junction quantification. Bioinformatics. 2015;32:43-49. DOI 10.1093/bioinformatics/btv642

50. Wang Z., Gerstein M., Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 2009;10(1):57-63. DOI 10.1038/nrg2484

51. Wetterstrand K. DNA sequencing costs: data from the NHGRI largescale genome sequencing program. (2015). Available at http://www.genome.gov/sequencingcosts/

52. Xie C., Yuan J., Li H., Li M., Zhao G., Bu D., Zhu W., Wu W., Chen R., Zhao Y. NONCODEv4: exploring the world of long non-coding RNA genes. Nucleic Acids Res. 2014;42(Database issue):D98-D103. DOI 10.1093/nar/gkt1222

53. Zimmerman E.F., Andrew F., Kalter H. Glucocorticoid inhibition of RNA synthesis responsible for cleft palate in mice: a model. Proc. Natl. Acad. Sci. USA. 1970;67(2):779-785.


Review

Views: 813


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2500-3259 (Online)