Development and validation of the PipeSeq program for RNA-seq data analysis in the Chlamydomonas reinhardtii as a model
https://doi.org/10.18699/vjgb-26-34
Abstract
RNA sequencing (RNA-seq) is a highly sensitive method for transcriptome analysis that allows simultaneous assessment of expression of thousands of genes and identification of expression patterns under various conditions. The existing variety of RNA-seq data formats, normalization methods, and approaches to statistical processing of results complicates comparison of data from different studies and reduces reproducibility of the analysis. This study presents an automated pipeline PipeSeq that combines standard steps of RNA-seq data processing: loading (SRA Toolkit), read alignment to the reference genome (HISAT2), transcript assembly (StringTie), transcript counting (FeatureCounts) and statistical analysis of differential gene expression under various experimental conditions (DESeq2). PipeSeq has a simple visual interface, supports multithreading, and generates ready-to-analyze gene expression heat maps, tables and graphs. The functionality of the pipeline is demonstrated on three sets of raw RNA-seq data from the green alga Chlamydomonas reinhardtii cells available in the NCBI SRA database. The data from these experiments were used to analyze the differential expression of C. reinhardtii genes encoding the GATA family transcription factors under different light cultivation conditions. The data obtained by in silico methods were verified by real-time reverse transcription polymerase chain reaction (RT-qPCR) for 12 GATA genes, which allowed us to hypothesize their functions and evaluate the correlation between the bulk (RNA-seq) and targeted (RT-qPCR) approaches. Our results showed that RNA-seq and RT-qPCR methods reveal similar directions of gene expression changes, but demonstrate differences in the effect size and sensitivity, which emphasizes the need for a combined use of the two approaches. Thus, the PipeSeq program is a tool for conducting a full cycle of bioinformatic analysis of RNA-seq data, additionally providing the opportunity to process RT-qPCR data and perform a comparative statistical analysis of the results obtained.
Keywords
About the Authors
A. M. NerezenkoRussian Federation
St. Petersburg
P. A. Virolainen
Russian Federation
St. Petersburg
S. A. Tupitsyna
Russian Federation
St. Petersburg
E. M. Chekunova
Russian Federation
St. Petersburg
References
1. Afgan E., Baker D., Batut B., van den Beek M., Bouvier D., Čech M., Chilton J., … Soranzo N., Goecks J., Taylor J., Nekrutenko A., Blankenberg D. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 2018;46(W1):W537-W544. doi 10.1093/nar/gky379
2. Bray N.L., Pimentel H., Melsted P., Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;4(5):525-527. doi 10.1038/nbt.3519
3. Coenye T. Do results obtained with RNA-sequencing require independent verification? Biofilm. 2021;3:100043. doi 10.1016/j.bioflm.2021.100043
4. Conesa A., Madrigal P., Tarazona S., Gomez-Cabrero D., Cervera A., McPherson A., Szcześniak M.W., Gaffney D.J., Elo L.L., Zhang X., Mortazavi A. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016;17:13. doi 10.1186/s13059-016-0881-8
5. Derveaux S., Vandesompele J., Hellemans J. How to do successful gene expression analysis using real-time PCR. Methods. 2010;50(4):227230. doi 10.1016/j.ymeth.2009.11.001
6. Di Tommaso P., Chatzou M., Floden E.W., Barja P.P., Palumbo E., Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017;35(4):316-319. doi 10.1038/nbt.3820
7. Elahimanesh M., Najafi M. Differentially expressed genes of RNA-seq data are suggested on the intersections of normalization techniques. Biochem Biophys Rep. 2024;37:101618. doi 10.1016/j.bbrep.2023.101618
8. Ewels P., Magnusson M., Lundin S., Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047-3048. doi 10.1093/bioinformatics/btw354
9. Goodstein D.M., Shu S., Howson R., Neupane R., Hayes R.D., Fazo J., Mitros T., Dirks W., Hellsten U., Putnam N., Rokhsar D.S. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 2012;40(D1):D1178-D1186. doi 10.1093/nar/gkr944
10. Harris E.H. The Chlamydomonas Sourcebook. A Comprehensive Guide to Biology and Laboratory Use. San Diego: Academic Press, 1989
11. He F., Liu Q., Zheng L., Cui Y., Shen Z., Zheng L. RNA-Seq analysis of rice roots reveals the involvement of post-transcriptional regulation in response to cadmium stress. Front Plant Sci. 2015;6:1136. doi 10.3389/fpls.2015.01136
12. Hunter J.D. Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007;9(3):90-95. doi 10.1109/MCSE.2007.55
13. Kim D., Langmead B., Salzberg S.L. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015;12(4):357-360. doi 10.1038/nmeth.3317
14. Kim D., Paggi J.M., Park C., Bennett C., Salzberg S.L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37(8):907-915. doi 10.1038/s41587019-0201-4
15. Kvitko K.V., Borschevskaya T.N., Chunaev A.S., Tugarinov V.V. Peterhof genetic collection of green algae strains (Chlorella, Scenedesmus, Chlamydomonas). In: Cultivation of Collection Strains of Algae. Leningrad, 1983 (in Russian)
16. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R., 1000 Genome Project Data Processing Subgroup. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078-2079. doi 10.1093/bioinformatics/btp352
17. Li X., Wang C.Y. From bulk, single-cell to spatial RNA sequencing. Int J Oral Sci. 2021;13:36. doi 10.1038/s41368-021-00146-0
18. Liao Y., Smyth G.K., Shi W. FeatureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30(7):923-930. doi 10.1093/bioinformatics/btt656
19. Liu C., Wu G., Huang X., Liu S., Cong B. Validation of housekeeping genes for gene expression studies in an ice alga Chlamydomonas during freezing acclimation. Extremophiles. 2012;16:419-425. doi 10.1007/s00792-012-0441-4
20. Livak K.J., Schmittgen T.D. Analysis of relative gene expression data using real-time quantitative PCR and the 2−ΔΔCT method. Methods. 2001;25(4):402-408. doi 10.1006/meth.2001.1262
21. Love M.I., Huber W., Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. doi 10.1186/s13059-014-0550-8
22. Luo X.M., Lin W.H., Zhu S., Zhu J.Y., Sun Y., Fan X.Y., Cheng M., … Liu L., Zhang M., Xie Q., Chong K., Wang Z.Y. Integration of lightand brassinosteroid-signaling pathways by a GATA transcription factor in Arabidopsis. Dev Cell. 2010;19(6):872-883. doi 10.1016/j.devcel.2010.10.023
23. Manfield I.W., Devlin P.F., Jen C.H., Westhead D.R., Gilmartin P.M. Conservation, convergence, and divergence of light-responsive, circadian-regulated, and tissue-specific expression patterns during evolution of the Arabidopsis GATA gene family. Plant Physiol. 2007; 143(2):941-958. doi 10.1104/pp.106.090761
24. Marioni J.C., Mason C.E., Mane S.M., Stephens M., Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18(9):1509-1517. doi 10.1101/gr.079558.108
25. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17(1):10-12. doi 10.14806/ej.17.1.200
26. McKinney W. pandas: a foundational Python library for data analysis and statistics. Python for High Performance and Scientific Computing. 2011;14(9):1-9.
27. Merchant S.S., Prochnik S.E., Vallon O., Harris E.H., Karpowicz S.J., Witman G.B., Terry A., … Werner G., Zhou K., Grigoriev I.V., Rokhsar D.S., Grossman A.R. The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science. 2007;318(5848):245-250. doi 10.1126/science.1143609
28. Mölder F., Jablonski K.P., Letcher B., Hall M.B., Tomkins-Tinch C.H., Sochat V., Forster J., … Wilm A., Holtgrewe M., Rahmann S., Nahnsen S., Köster J. Sustainable data analysis with Snakemake. F1000Res. 2021;10:33. doi 10.12688/f1000research.29032.2
29. Mortazavi A., Williams B.A., McCue K., Schaeffer L., Wold B.J. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5(7):621-628. doi 10.1038/nmeth.1226
30. Müller N., Wenzel S., Zou Y., Künzel S., Sasso S., Weiß D., Prager K., Grossman A., Kottke T., Mittag M. A plant cryptochrome controls key features of the Chlamydomonas circadian clock and its life cycle. Plant Physiol. 2017;174(1):185-201. doi 10.1104/pp.17.00349
31. Muzellec B., Teleńczuk M., Cabeli V., Andreux M. PyDESeq2: a python package for bulk RNA-seq differential expression analysis. Bioinformatics. 2023;39(9):btad547. doi 10.1093/bioinformatics/btad547
32. Naito T., Kiba T., Koizumi N., Yamashino T., Mizuno T. Characterization of a unique GATA family gene that responds to both light and cytokinin in Arabidopsis thaliana. Biosci Biotechnol Biochem. 2007; 71(6):1557-1560. doi 10.1271/bbb.60692
33. Pertea M., Pertea G.M., Antonescu C.M., Chang T.C., Mendell J.T., Salz-berg S.L. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33(3):290-295. doi 10.1038/nbt.3122
34. Pertea M., Kim D., Pertea G.M., Leek J.T., Salzberg S.L. Transcriptlevel expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc. 2016;11(9):1650-1667. doi 10.1038/nprot.2016.095
35. Ren W., Kong L., Jiang S., Ma L., Wang H., Li X., Liu Y., Ma W., Yan X. Genome-wide identification, evolution, and characterization of GATA gene family and GATA gene expression analysis postMeJA treatment in Platycodon grandiflorum. J Plant Growth Regul. 2025;44:155-167. doi 10.1007/s00344-024-11468-8
36. Reyes J.C., Muro-Pastor M.I., Florencio F.J. The GATA family of transcription factors in Arabidopsis and rice. Plant Physiol. 2004; 134(4):1718-1732. doi 10.1104/pp.103.037788
37. Riechmann J.L., Heard J., Martin G., Reuber L., Jiang C.Z., Keddie J., Adam L., … Broun P., Zhang J.Z., Ghandehari D., Sherman B.K., Yu G.L. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science. 2000;290(5499):2105-2110. doi 10.1126/science.290.5499.2105
38. Salomé P.A., Merchant S.S. A series of fortunate events: introducing Chlamydomonas as a reference organism. Plant Cell. 2019;31(8): 1682-1707. doi 10.1105/tpc.18.00952
39. Sanchez-Tarre V., Kiparissides A. The effects of illumination and tro- phic strategy on gene expression in Chlamydomonas reinhardtii. Algal Res. 2021;54:102186. doi 10.1016/j.algal.2021.102186
40. Schmittgen T., Livak K. Analyzing real-time PCR data by the com- parative CT method. Nat Protoc. 2008;3:1101-1108. doi 10.1038/nprot.2008.73
41. Schröder P., Hsu B.Y., Gutsche N., Winkler J.B., Hedtke B., Grimm B., Schwechheimer C. B-GATA factors are required to repress highlight stress responses in Marchantia polymorpha and Arabidopsis thaliana. Plant Cell Environ. 2023;46(8):2376-2390. doi 10.1111/pce.14629
42. Schwechheimer C., Schröder P.M., Blaby-Haas C.E. Plant GATA factors: their biology, phylogeny, and phylogenomics. Annu Rev Plant Biol. 2022;73(1):123-148. doi 10.1146/annurev-arplant-072221092913
43. Shi Y., He M. Differential gene expression identified by RNA-Seq and qPCR in two sizes of pearl oyster (Pinctada fucata). Gene. 2014; 538(2):313-322. doi 10.1016/j.gene.2014.01.031
44. Virolainen P.A., Chekunova E.M. GATA family transcription factors in alga Chlamydomonas reinhardtii. Curr Genet. 2024;70(1):1. doi 10.1007/s00294-024-01280-y
45. Voigt J., Münzner P. The Chlamydomonas cell cycle is regulated by a light/dark-responsive cell-cycle switch. Planta. 1987;172:463-472. doi 10.1007/BF00393861
46. Wang Z., Gerstein M., Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57-63. doi 10.1038/nrg2484
47. Wheeler D.L., Barrett T., Benson D.A., Bryant S.H., Canese K., Church D.M., DiCuccio M., … Suzek T.O., Tatusov R., Tatusova T.A., Wagner L., Yaschenko E. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2005;33:D39D45. doi 10.1093/nar/gki062
48. Zhao S., Ye Z., Stanton R. Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols. RNA. 2020;26(8):903-909. doi 10.1261/rna.074922.120
49. Zhao Y., Li M.C., Konaté M.M., Chen L., Das B., Karlovich C., Williams P.M., Evrard Y.A., Doroshow J.H., McShane L.M. TPM, FPKM, or normalized counts? A comparative study of quantification measures for the analysis of RNA-seq data from the NCI patient-derived models repository. J Transl Med. 2021;19(1):269. doi 10.1186/s12967-021-02936-w
Review
JATS XML





