Preview

Vavilov Journal of Genetics and Breeding

Advanced search

Development and validation of the PipeSeq program for RNA-seq data analysis in the Chlamydomonas reinhardtii as a model

https://doi.org/10.18699/vjgb-26-34

Abstract

RNA sequencing (RNA-seq) is a highly sensitive method for transcriptome analysis that allows simultaneous assessment of expression of thousands of genes and identification of expression patterns under various conditions. The existing variety of RNA-seq data formats, normalization methods, and approaches to statistical processing of results complicates comparison of data from different studies and reduces reproducibility of the analysis. This study presents an automated pipeline PipeSeq that combines standard steps of RNA-seq data processing: loading (SRA Toolkit), read alignment to the reference genome (HISAT2), transcript assembly (StringTie), transcript counting (FeatureCounts) and statistical analysis of differential gene expression under various experimental conditions (DESeq2). PipeSeq has a simple visual interface, supports multithreading, and generates ready-to-analyze gene expression heat maps, tables and graphs. The functionality of the pipeline is demonstrated on three sets of raw RNA-seq data from the green alga Chlamydomonas reinhardtii cells available in the NCBI SRA database. The data from these experiments were used to analyze the differential expression of C. reinhardtii genes encoding the GATA family transcription factors under different light cultivation conditions. The data obtained by in silico methods were verified by real-time reverse transcription polymerase chain reaction (RT-qPCR) for 12 GATA genes, which allowed us to hypothesize their functions and evaluate the correlation between the bulk (RNA-seq) and targeted (RT-qPCR) approaches. Our results showed that RNA-seq and RT-qPCR methods reveal similar directions of gene expression changes, but demonstrate differences in the effect size and sensitivity, which emphasizes the need for a combined use of the two approaches. Thus, the PipeSeq program is a tool for conducting a full cycle of bioinformatic analysis of RNA-seq data, additionally providing the opportunity to process RT-qPCR data and perform a comparative statistical analysis of the results obtained.

About the Authors

A. M. Nerezenko
Saint-Petersburg University
Russian Federation

St. Petersburg



P. A. Virolainen
Saint-Petersburg University
Russian Federation

St. Petersburg



S. A. Tupitsyna
Saint-Petersburg University
Russian Federation

St. Petersburg



E. M. Chekunova
Saint-Petersburg University
Russian Federation

St. Petersburg



References

1. Afgan E., Baker D., Batut B., van den Beek M., Bouvier D., Čech M., Chilton J., … Soranzo N., Goecks J., Taylor J., Nekrutenko A., Blankenberg D. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 2018;46(W1):W537-W544. doi 10.1093/nar/gky379

2. Bray N.L., Pimentel H., Melsted P., Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;4(5):525-527. doi 10.1038/nbt.3519

3. Coenye T. Do results obtained with RNA-sequencing require independent verification? Biofilm. 2021;3:100043. doi 10.1016/j.bioflm.2021.100043

4. Conesa A., Madrigal P., Tarazona S., Gomez-Cabrero D., Cervera A., McPherson A., Szcześniak M.W., Gaffney D.J., Elo L.L., Zhang X., Mortazavi A. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016;17:13. doi 10.1186/s13059-016-0881-8

5. Derveaux S., Vandesompele J., Hellemans J. How to do successful gene expression analysis using real-time PCR. Methods. 2010;50(4):227230. doi 10.1016/j.ymeth.2009.11.001

6. Di Tommaso P., Chatzou M., Floden E.W., Barja P.P., Palumbo E., Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017;35(4):316-319. doi 10.1038/nbt.3820

7. Elahimanesh M., Najafi M. Differentially expressed genes of RNA-seq data are suggested on the intersections of normalization techniques. Biochem Biophys Rep. 2024;37:101618. doi 10.1016/j.bbrep.2023.101618

8. Ewels P., Magnusson M., Lundin S., Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047-3048. doi 10.1093/bioinformatics/btw354

9. Goodstein D.M., Shu S., Howson R., Neupane R., Hayes R.D., Fazo J., Mitros T., Dirks W., Hellsten U., Putnam N., Rokhsar D.S. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 2012;40(D1):D1178-D1186. doi 10.1093/nar/gkr944

10. Harris E.H. The Chlamydomonas Sourcebook. A Comprehensive Guide to Biology and Laboratory Use. San Diego: Academic Press, 1989

11. He F., Liu Q., Zheng L., Cui Y., Shen Z., Zheng L. RNA-Seq analysis of rice roots reveals the involvement of post-transcriptional regulation in response to cadmium stress. Front Plant Sci. 2015;6:1136. doi 10.3389/fpls.2015.01136

12. Hunter J.D. Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007;9(3):90-95. doi 10.1109/MCSE.2007.55

13. Kim D., Langmead B., Salzberg S.L. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015;12(4):357-360. doi 10.1038/nmeth.3317

14. Kim D., Paggi J.M., Park C., Bennett C., Salzberg S.L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37(8):907-915. doi 10.1038/s41587019-0201-4

15. Kvitko K.V., Borschevskaya T.N., Chunaev A.S., Tugarinov V.V. Peterhof genetic collection of green algae strains (Chlorella, Scenedesmus, Chlamydomonas). In: Cultivation of Collection Strains of Algae. Leningrad, 1983 (in Russian)

16. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R., 1000 Genome Project Data Processing Subgroup. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078-2079. doi 10.1093/bioinformatics/btp352

17. Li X., Wang C.Y. From bulk, single-cell to spatial RNA sequencing. Int J Oral Sci. 2021;13:36. doi 10.1038/s41368-021-00146-0

18. Liao Y., Smyth G.K., Shi W. FeatureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30(7):923-930. doi 10.1093/bioinformatics/btt656

19. Liu C., Wu G., Huang X., Liu S., Cong B. Validation of housekeeping genes for gene expression studies in an ice alga Chlamydomonas during freezing acclimation. Extremophiles. 2012;16:419-425. doi 10.1007/s00792-012-0441-4

20. Livak K.J., Schmittgen T.D. Analysis of relative gene expression data using real-time quantitative PCR and the 2−ΔΔCT method. Methods. 2001;25(4):402-408. doi 10.1006/meth.2001.1262

21. Love M.I., Huber W., Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. doi 10.1186/s13059-014-0550-8

22. Luo X.M., Lin W.H., Zhu S., Zhu J.Y., Sun Y., Fan X.Y., Cheng M., … Liu L., Zhang M., Xie Q., Chong K., Wang Z.Y. Integration of lightand brassinosteroid-signaling pathways by a GATA transcription factor in Arabidopsis. Dev Cell. 2010;19(6):872-883. doi 10.1016/j.devcel.2010.10.023

23. Manfield I.W., Devlin P.F., Jen C.H., Westhead D.R., Gilmartin P.M. Conservation, convergence, and divergence of light-responsive, circadian-regulated, and tissue-specific expression patterns during evolution of the Arabidopsis GATA gene family. Plant Physiol. 2007; 143(2):941-958. doi 10.1104/pp.106.090761

24. Marioni J.C., Mason C.E., Mane S.M., Stephens M., Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18(9):1509-1517. doi 10.1101/gr.079558.108

25. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17(1):10-12. doi 10.14806/ej.17.1.200

26. McKinney W. pandas: a foundational Python library for data analysis and statistics. Python for High Performance and Scientific Computing. 2011;14(9):1-9.

27. Merchant S.S., Prochnik S.E., Vallon O., Harris E.H., Karpowicz S.J., Witman G.B., Terry A., … Werner G., Zhou K., Grigoriev I.V., Rokhsar D.S., Grossman A.R. The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science. 2007;318(5848):245-250. doi 10.1126/science.1143609

28. Mölder F., Jablonski K.P., Letcher B., Hall M.B., Tomkins-Tinch C.H., Sochat V., Forster J., … Wilm A., Holtgrewe M., Rahmann S., Nahnsen S., Köster J. Sustainable data analysis with Snakemake. F1000Res. 2021;10:33. doi 10.12688/f1000research.29032.2

29. Mortazavi A., Williams B.A., McCue K., Schaeffer L., Wold B.J. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5(7):621-628. doi 10.1038/nmeth.1226

30. Müller N., Wenzel S., Zou Y., Künzel S., Sasso S., Weiß D., Prager K., Grossman A., Kottke T., Mittag M. A plant cryptochrome controls key features of the Chlamydomonas circadian clock and its life cycle. Plant Physiol. 2017;174(1):185-201. doi 10.1104/pp.17.00349

31. Muzellec B., Teleńczuk M., Cabeli V., Andreux M. PyDESeq2: a python package for bulk RNA-seq differential expression analysis. Bioinformatics. 2023;39(9):btad547. doi 10.1093/bioinformatics/btad547

32. Naito T., Kiba T., Koizumi N., Yamashino T., Mizuno T. Characterization of a unique GATA family gene that responds to both light and cytokinin in Arabidopsis thaliana. Biosci Biotechnol Biochem. 2007; 71(6):1557-1560. doi 10.1271/bbb.60692

33. Pertea M., Pertea G.M., Antonescu C.M., Chang T.C., Mendell J.T., Salz-berg S.L. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33(3):290-295. doi 10.1038/nbt.3122

34. Pertea M., Kim D., Pertea G.M., Leek J.T., Salzberg S.L. Transcriptlevel expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc. 2016;11(9):1650-1667. doi 10.1038/nprot.2016.095

35. Ren W., Kong L., Jiang S., Ma L., Wang H., Li X., Liu Y., Ma W., Yan X. Genome-wide identification, evolution, and characterization of GATA gene family and GATA gene expression analysis postMeJA treatment in Platycodon grandiflorum. J Plant Growth Regul. 2025;44:155-167. doi 10.1007/s00344-024-11468-8

36. Reyes J.C., Muro-Pastor M.I., Florencio F.J. The GATA family of transcription factors in Arabidopsis and rice. Plant Physiol. 2004; 134(4):1718-1732. doi 10.1104/pp.103.037788

37. Riechmann J.L., Heard J., Martin G., Reuber L., Jiang C.Z., Keddie J., Adam L., … Broun P., Zhang J.Z., Ghandehari D., Sherman B.K., Yu G.L. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science. 2000;290(5499):2105-2110. doi 10.1126/science.290.5499.2105

38. Salomé P.A., Merchant S.S. A series of fortunate events: introducing Chlamydomonas as a reference organism. Plant Cell. 2019;31(8): 1682-1707. doi 10.1105/tpc.18.00952

39. Sanchez-Tarre V., Kiparissides A. The effects of illumination and tro- phic strategy on gene expression in Chlamydomonas reinhardtii. Algal Res. 2021;54:102186. doi 10.1016/j.algal.2021.102186

40. Schmittgen T., Livak K. Analyzing real-time PCR data by the com- parative CT method. Nat Protoc. 2008;3:1101-1108. doi 10.1038/nprot.2008.73

41. Schröder P., Hsu B.Y., Gutsche N., Winkler J.B., Hedtke B., Grimm B., Schwechheimer C. B-GATA factors are required to repress highlight stress responses in Marchantia polymorpha and Arabidopsis thaliana. Plant Cell Environ. 2023;46(8):2376-2390. doi 10.1111/pce.14629

42. Schwechheimer C., Schröder P.M., Blaby-Haas C.E. Plant GATA factors: their biology, phylogeny, and phylogenomics. Annu Rev Plant Biol. 2022;73(1):123-148. doi 10.1146/annurev-arplant-072221092913

43. Shi Y., He M. Differential gene expression identified by RNA-Seq and qPCR in two sizes of pearl oyster (Pinctada fucata). Gene. 2014; 538(2):313-322. doi 10.1016/j.gene.2014.01.031

44. Virolainen P.A., Chekunova E.M. GATA family transcription factors in alga Chlamydomonas reinhardtii. Curr Genet. 2024;70(1):1. doi 10.1007/s00294-024-01280-y

45. Voigt J., Münzner P. The Chlamydomonas cell cycle is regulated by a light/dark-responsive cell-cycle switch. Planta. 1987;172:463-472. doi 10.1007/BF00393861

46. Wang Z., Gerstein M., Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57-63. doi 10.1038/nrg2484

47. Wheeler D.L., Barrett T., Benson D.A., Bryant S.H., Canese K., Church D.M., DiCuccio M., … Suzek T.O., Tatusov R., Tatusova T.A., Wagner L., Yaschenko E. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2005;33:D39D45. doi 10.1093/nar/gki062

48. Zhao S., Ye Z., Stanton R. Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols. RNA. 2020;26(8):903-909. doi 10.1261/rna.074922.120

49. Zhao Y., Li M.C., Konaté M.M., Chen L., Das B., Karlovich C., Williams P.M., Evrard Y.A., Doroshow J.H., McShane L.M. TPM, FPKM, or normalized counts? A comparative study of quantification measures for the analysis of RNA-seq data from the NCI patient-derived models repository. J Transl Med. 2021;19(1):269. doi 10.1186/s12967-021-02936-w


Review

Views: 55

JATS XML


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2500-3259 (Online)