Preview

Vavilov Journal of Genetics and Breeding

Advanced search
Vol 30, No 2 (2026)
View or download the full issue PDF (Russian)

МОЛЕКУЛЯРНАЯ И КЛЕТОЧНАЯ БИОЛОГИЯ

163-180 72
Abstract

To assess the possibility of integrating extracellular double-stranded DNA fragments into the recipient genome of hematopoietic stem cells, a complex substrate was constructed consisting of the entire M13F-AluI-M13R fragment and its two restrictive derivatives, appearing after hydrolysis with restriction endonucleases EcoRI and HindIII: M13FAluI-EcoRI and M13R-AluI-HindIII. The substrate contained a pBlueScript+ plasmid polylinker sequence, absent in the human genome, which framed the human AluI fragment cloned at the EcoRV site. Human bone marrow cells were treated with the DNA of the constructed complex substrate; taking into account the repair time of pangenomic single-strand breaks, preparations of metaphase plates were obtained. FISH revealed specific fluorescent signals. Simultaneously, DNA isolated from colonies obtained from bone marrow cells treated with a complex substrate was sequenced. Two rounds of sequencing were carried out: whole-genome and selective after targeted hybridization on metal beads. The results obtained indicate that homologous exchange between extrachromosomal and chromosomal DNA is possible. Integration into the genome via the single-strand annealing mechanism, involving microhomologies, is also possible. Intermediates were discovered that suggest the existence of an unusual integration into the genome at the nick of one end of the fragment and the other end of the fragment hanging freely into the interchromosomal space. A direct assessment of the possibility of integrating TAMRA-labeled fragments of fragmented human DNA and E. coli DNA into the genome of recipient cells was carried out using a human bone marrow cell model. The results obtained indicate that specific signals of homologous DNA are distributed throughout the chromosome body (human bone marrow cell model). Signals from nonhomologous E. coli DNA are predominantly concentrated in the centromeric regions of chromosomes. The ratio of the number of obtained reads with integration elements and FISH signals suggested the existence of a strong interaction between extracellular fragments and chromosomal DNA. Experiments have been conducted showing that linear plasmid DNA, after internalization into hematopoietic stem cells, forms a monomer ring. Internalized into the intracellular space, extracellular plasmid DNA is isolated together with chromosomal DNA after stringent purification and fractionation procedures. This fact suggests the existence of a strong ring associate of plasmid DNA and chromosome DNA formed without the participation of a protein framework in the form of a looped chromosomal strand.

181-193 59
Abstract

Naïve human pluripotent stem cells (PSCs) are a promising new tool in biomedical research. They provide access to the early embryonic development programmes and offer breakthrough solutions in regenerative medicine. However, the current inability to obtain long-term cultures of genetically and epigenetically stable naïve human PSC lines poses a challenge to their effective application in biomedicine. The recently proposed HENSM culture medium is claimed to enable the obtaining and long-term maintenance of naïve PSC lines. In this study, the potential of the HENSM medium for obtaining stable naïve human PSC lines was investigated. We successfully reset the primed induced pluripotent stem cell (iPSC) line ICGi022-A (K7-4Lf), derived from a healthy donor, to a naïve state using the HENSM medium. Naïve iPSCs grow in the form of dome-shaped colonies, both with and without a feeder layer of cells. The resulting cells retained expression of the key pluripotency factors and activated the naïve PSC transcriptional programme, including expression of endogenous retroviral elements, early epiblast marker genes and genes associated with totipotency. The naïve iPSC line was capable of differentiating into derivatives of the three primary germ layers, as well as producing trophoblast derivatives. Culturing naïve iPSCs in low-adhesion conditions resulted in the spontaneous formation of three-dimensional structures (blastoids) resembling early human blastocysts. The X chromosome, which was in an eroded inactive state in the original cell line, was reactivated in the naïve cells, but returned to its normal inactive state when the naïve cells were re-primed. Notably, naïve iPSCs demonstrated limited ability to directly differentiate into endothelial cells. However, their competence to give rise to mature endothelial derivatives was restored upon returning to the primed state, achieving comparable efficiency to the original primed iPSCs. Thus, the resulting naïve iPSC line has significant potential for studying the early stages of embryogenesis and for other biomedical applications, including disease modelling. However, the naïve ICGi022-A line proved to be karyotypically unstable during long-term cultivation using HENSM medium. As there is a risk of karyotypic aberrations during the maintenance of naïve PSCs, further improvement of the culture conditions is necessary to obtain reliable, karyotypically stable lines of naïve pluripotent cells.

ГЕНЕТИКА И СЕЛЕКЦИЯ РАСТЕНИЙ

194-204 71
Abstract

The genetic structure was studied using five ISSR markers in 44 individuals of samples obtained from eight coenopopulations (CPs) of Caragana jubata (Pall.) Poir. in the mountainous conditions of Central Asia – in the Western Pamirs and Inner Tien Shan (Kyrgyzstan) and in Southern Siberia – in Altai (Republic of Altai), Western (Republic of Tyva) and Eastern (Republic of Buryatia) Sayan. We studied the species in a range of geographical distances of more than 2,500 km and in the range of absolute altitudes of 1,570–3,680 m. Caragana jubata is listed in the Red Data Books of eight subjects of the Russian Federation. The species population is declining, including due to anthropogenic impact. The aim of the current work is to identify genetic diversity and heterogeneity in C. jubata coenopopulations depending on their geographic and altitudinal confinement in the mountains of Central Asia and Southern Siberia. It was shown that in undisturbed locations, the studied CPs of this species were characterized by a high number of individuals and by the occupied area of more than 100 m2. Almost every sample from the C. jubata CP studied by us (except for representatives from CP7) contained genotypes possessing unique DNA fragments. The highest proportion of such genotypes (75 %) was found in the sample from CP3 (Inner Tien Shan, Teskey-Ala-Too Ridge, absolute altitude 2,550 m). We did not find unique fragments in the genotypes of individuals from the studied sample of CP7 (Western Sayan, Republic of Tuva). Anthropogenic impact on plants at this location is a possible reason for that. The revealed predominance of intrapopulation genetic variability over interpopulation genetic variability in samples from eight C. jubata CPs studied by us may indicate the stability of representatives of this species within the parts of the range studied by us.

205-211 71
Abstract

Variability of the genomes of cellular organelles (chloroplast and mitochondria) is an important component of the overall variability of the plant genome. A large amount of data has already been obtained on the comparative characteristics of the organization of organelle DNA sequences for different groups of plants. This paper presents new original data on the variability of mitochondrial and chloroplast genomes in soybean (Glycine max (L.) Merr.), a crop of great economic importance widely cultivated in Central Europe, including the Republic of Belarus. Initially, we supposed that the peculiarities of soybean organelle DNA sequence or organization promote certain soybean cultivars to be the best maternal and others, alternatively, the best paternal forms. As a result of the study, new complete nucleotide sequences of chloroplast and mitochondrial genomes of 46 soybean samples were obtained by the next generation sequencing method (NGS) on the Illumina platform. A comprehensive bioinformatic comparative study of intraspecific organelle genome variability in 46 soybean varieties of diverse geographical origin was conducted. Polymorphic loci of genomes were discovered. Data on DNA variability were verified by Sanger sequencing. The spectrum of organelle DNA variability of cultivated soybean was represented by three chloroplast DNA haplotypes (C1–C3) and five mitochondrial DNA haplotypes (M1–M5). A comparatively low level of intraspecific variability of organelle genomes in G. max was revealed. The soybean chloroplast genome had a lower level of sequence variability than the mitochondrial genome. A set of DNA markers for polymorphic loci of organelle genomes was developed, allowing the differentiation of varieties of the studied group into plasmatypes. Additionally, 90 soybean samples from the collection were studied using PCR followed by Sanger sequencing. The low level of intraspecific variability of organelle genomes in G. max was confirmed on the extended group of samples. The majority of cultivars were represented by three plasmatypes – C1/M1, C2/M2 and C1/M3. 46 complete chloroplast DNA sequences have been deposited in NCBI GenBank. The hypothesis that organelle DNA influences the combining ability of different varieties has not yet been confirmed. A more detailed study of the mechanisms of nuclear-cytoplasmic interaction is required, as well as a search for nuclear markers that affect the expression of organelle genes.

212-221 46
Abstract

Stress phytohormones – salicylic and jasmonic acids – participate in the plant defense response. The increase in the content of these compounds during plant infection by pathogens leads to the activation of signaling pathways, ultimately resulting in changes in gene expression and protein synthesis, including a group of pathogenesis-related (PR) proteins. Phytohormone-dependent so-called marker genes, are used to assess the activation of these signaling pathways. In this study, PR-1 genes, which are markers for salicylic acid signaling and expressed in pea roots, were identified and characterized, and the effects of salicylic acid and methyl jasmonate on their expression were analyzed. It was shown that in pea roots, PR-1, encoded by the Psat1g156240 gene, is among the most significantly expressed genes in the control. Salicylic acid did not cause a change in the expression of this gene; however, it was induced by methyl jasmonate after 24 h. Analysis of the expression of other genes encoding PR-1 proteins showed that salicylate had no effect on their expression after 24 and 72 h. Analysis of the expression of genes encoding chitinases and chitinase-like proteins showed that the former do not exhibit specificity in response to the salicylate and methyl jasmonate, except for the Psat1g131280, the expression of which increased after 24 and 72 h of treatment with methyl jasmonate. Genes Psat1g147600, Psat1g147560, Psat1g149120 encoding chitinase-like proteins, were barely expressed in pea roots in control, and were specifically induced by salicylic acid. β-1,3-glucanase genes were not induced in roots by the studied phytohormones. The obtained results allowed to reveal specific genes including chitinase-like proteins, the expression of which is salicylate-inducible. These genes can be used for the assessment of the activation of the salicylate-dependent signaling pathway in pea roots.

222-232 60
Abstract

Mutation serves as a pivotal source of diversity in plant breeding. This study focused on identifying stable rice mutant lines. Fourteen rice mutant lines, along with four conventional cultivars, were evaluated in a randomized complete block design with three replicates across three Iranian locations (Rasht, ChaparSar, and Fars province) during two growing seasons (2015, 2016). All statistical analyses were performed using the ‘metan’ (multi-environment trial analysis) R package. Single-environment ANOVA indicated significant genotypic effects for all traits. Likelihood ratio tests (LRTs) confirmed significant environment and genotype-by-environment interaction (GEI) effects for all traits. The first three principal components (PCs) captured 68.13, 14.46, and 9.76 % of the GEI variation, respectively. Heatmap visualization of yield performance and WAASB (weighted average of absolute scores based on best linear unbiased prediction, BLUP) highlighted genotypes G3, G9, G6, G12, and G5 as both high-yielding and stable. Multi-trait stability index (MTSI) analysis, designed to reveal genotypic strengths and weaknesses, selected only genotypes G7, G5, and G1. The top five genotypes based on the harmonic mean of the relative performance of genotypic values (HMRPGV) were G5, G12, G7, G2, and G1. The superior performance of certain mutants demonstrates that mutation has effectively generated significant genetic diversity. Notably, genotypes G12, G5, and G9 exhibited a clear advantage over the other genotypes and warrant consideration for selection or cultivar release; however, only G5 was selected based on all traits in the MTSI index and could therefore undergo selection or cultivar introduction processes.

ГЕНЕТИЧЕСКАЯ ИНЖЕНЕРИЯ

233-240 49
Abstract

Lens culinaris Medik. (lentil) is an agronomically important leguminous species, but its genome modification is rarely used for obtaining new varieties, probably due to a low efficiency of transformation protocols. Development of universal, genotype-independent protocols for obtaining transgenic plants usually relies, among other factors, on the possibility of obtaining a substantial number of transgenic cells in vitro. This study aimed to adapt a previously developed Agrobacteriummediated transformation protocol, used for a related legume, for the production of transgenic callus tissue in L. culinaris. We used two different markers of transgenic tissue, beta-glucuronidase and green fluorescent protein, to find an optimal type of explant for obtaining transgenic tissue in lentil. We also evaluated the impact of hygromycin, a common selective agent, on the amount of transgenic tissue in developing transformed explants of L. culinaris. According to our results, the transformation protocol commonly used for Medicago truncatula Gaertn. leaf explants can also be applied for obtaining transgenic calli from L. culinaris shoot apices. Explants from shoot apices demonstrated higher initial transformation rate in comparison with explants from roots, stems and leaves. Moreover, explants of different types, which were cultivated on medium without hygromycin, developed significantly fewer calli expressing reporter genes than those grown on hygromycin-containing medium, confirming that hygromycin may be used as an effective selection agent for lentil. During our analysis, we noticed GUS-like staining in calli which didn’t contain plasmids for GUS gene expression. This can be explained with so-called intrinsic GUS-like activity, which was described in previous research. These data can be used for further development of effective and universal L. culinaris transformation and genome editing protocols.

241-249 54
Abstract

In higher plants, the L-galactose pathway is the main pathway for the biosynthesis of vitamin C (ascorbate, Asc), the final step of which is connected with the functioning of the mitochondrial electron transport chain (ETC). In addition to the main cytochrome pathway, plant ETC includes an alternative pathway (AP) via alternative terminal oxidase (AOX). The engagement of AOX promotes Asc synthesis, and it is hypothesized that AOX suppression under conditions of Asc deficiency may reduce plant viability. The aim of this work was to examine the consequences of simultaneously knocking out two genes in Arabidopsis thaliana: AOX1a, the most stress-inducible AOX gene, and VTC2, encoding a key enzyme of the L-galactose pathway of Asc synthesis. Two lines of A. thaliana with T-DNA insertions in the target genes were crossed to generate hybrid lines. Seed characteristics of the first (F1) and second (F2) generations were analyzed. F1 seeds were larger than those of parent lines, possibly due to heterosis. In the F2 generation, self-pollination of F1 plants resulted in seeds with significant size variation, including a group of small seeds with degenerative morphological abnormalities. Most of small seeds failed to germinate or died at the seedling stage. PCR genotyping of these seeds revealed the absence of native AOX1a and VTC2 indicating a lethal mutation caused by simultaneous knockout of both genes. One likely genetic cause is the interaction of mutations in non-allelic genes. At the physiological level, irreversible respiratory damage may occur, possibly including the impact of a cryptic mutation in the vtc2 line. Further studies are necessary to confirm these hypotheses. In general, the results obtained indicate the vital co-functioning of the AP and the L-galactose pathway of Asc biosynthesis and may be useful for the development of genetically engineered techniques for the control of vitamin C synthesis in plants.

МЕДИЦИНСКАЯ ГЕНЕТИКА

250-258 48
Abstract

Genetic correlation is a key characteristic of the global genetic similarity of human traits. Its primary underlying mechanism is pleiotropy, which operates at various biological levels. Gene-level pleiotropy is of particular interest, as genes are the fundamental functional units of the genome. Using publicly available results from genome-wide association studies for 324 diseases, we selected a set of 45 diseases in which every pair exhibited a significant genetic correlation. These diseases belonged to 10 nosological categories. The search for genes with pleiotropic effects was carried out using three approaches: (1) gene-based association analysis, (2) selection of single nucleotide polymorphisms (SNP) within gene coding regions significantly associated with at least two diseases, and (3) a cross-trait meta-analysis of SNP association signals followed by the identification of independent loci and gene prioritization within those loci. A comprehensive bioinformatic analysis was performed on all genes identified through these methods. We identified 167 pleiotropic genes implicated in 39 diseases. The most pleiotropic genes in our study were LPA, TCF7L2, SLC22A3, FES, CDKN2B, and APOE, which were associated with 7 to 9 diseases each. Bioinformatic analysis revealed that the pleiotropic genes identified for these 39 diseases are also involved in the genetic architecture of 501 other diseases and traits. This indicates a high degree of pleiotropy, facilitated by the involvement of these genes in diverse biological processes – including homeostasis, cell-cell signaling, regulation of cell proliferation, transport, and catalytic activity – and various molecular functions, such as signaling receptor binding. Thus, we demonstrated that 87% of diseases within a fully connected correlation network share associated genes with at least one other disease. This finding strongly suggests that genetic correlations between human diseases are largely driven by the pleiotropic effects of shared genes.

259-266 51
Abstract

Metformin is a first-line therapy for type 2 diabetes, yet individual response varies significantly, with over 30 % of patients failing to achieve optimal glycemic control. The specific regulatory mechanisms of this phenomenon remain poorly understood and genetic variants involved are mainly undiscovered. There are multiple lines of evidence that the leading role in determining the variance in phenotypic outcome belongs to regulatory SNPs (rSNPs) as they directly modify gene expression. Therefore, the genome-wide search for such functional variants and deciphering associated phenotypes stands as a fundamental challenge. Previously, based on the results of bioinformatics analysis of allele-specific expression and binding landscape in human peripheral blood mononuclear cells, we have established an original panel of 14 796 rSNPs within promotors of 5132 genes. Aiming to pinpoint functional variants most likely linked to metformin hepatic response and impacts on liver gluconeogenesis, we analyzed the relevant open-access data as well as rSNPs from our panel and the corresponding genes. 1196 genes reported to be regulated by metformin in human hepatocytes and 115 genes involved in gluconeogenesis and/or its regulation via Gene Ontology annotations were intersected. Free R software and STRING v.11 tools were used for functional annotation. A number of genes harboring rSNPs within promotor regions were found to be particularly implicated in the mechanisms of metformin’s action. Functional enrichment analyses revealed enrichment in critical pathways including FoxO, TNF-α and TGF-β signaling, also implicated in diabetes complications. Among these, six genes (ARPP19, ATF4, NR3C1, PFKFB3, TCF7L2, and WDR5) were strongly associated with regulation of gluconeogenesis, and may be modulated by metformin in the liver. We conclude that metformin therapy response may be influenced by the newly identified functional SNPs including rSNPs within the promotors of genes for gluconeogenic enzymes and transcription regulators.

267-273 46
Abstract

Serotonin transporter gene polymorphism is important in the regulation of the serotoninergic system that affects mood and the regulation of emotions and behavior. In this study, 128 channel electroencephalogram recordings were performed, and buccal epithelium samples were obtained from 53 volunteers (32 females). La, Lg, and S alleles were identified by polymerase chain reaction. The aim of the study was to investigate the connectivity of the default mode network, measured using resting state electrophysiologic data, depending on the serotonin transporter gene polymorphism. Localization of the sources of bioelectrical activity of the cerebral cortex was performed by the beamformer method. Comparisons of LaLa genotype carriers and S or Lg allele carriers were performed using T-contrast of connectivity indices calculated between the nodes of the default mode network and the rest of the brain. It was found that carriers of the S allele were characterized by increased connectivity of the default mode network with the visual association cortex and with structures forming the posterior node of the default mode network, as well as increased connectivity of the posterior node of the default mode network with the right parahippocampal gyrus, and this pattern of connectivity may predispose to the onset and/or maintenance of intrusive thoughts. Whereas carriers of the LaLa genotype had higher connectivity of the anterior node of the default mode network with the right ventromedial prefrontal cortex, with the medial frontal gyrus, and with the posterior cingulate cortex, which is the structure of the posterior node of the default mode network, compared to carriers of the S or Lg allele. Also, carriers of the LaLa genotype had higher connectivity of the posterior node of the default mode network with the cluster involving the right dorsolateral prefrontal cortex compared to carriers of the S or Lg allele. It could be hypothesized that increased connectivity of the default mode network with brain structures (i. e., dorsolateral and ventromedial prefrontal cortex) involved in cognitive regulation processes may contribute to the regulation of the processes of the default mode network associated with autobiographical memory.

ПОПУЛЯЦИОННАЯ ГЕНЕТИКА

274-283 54
Abstract

The black bog ant Formica picea complex is widespread from the Atlantic to the Pacific coasts of Eurasia. This complex was earlier believed to consist of one or two species (F. picea and F. candida). However, molecular analysis suggested that it includes three cryptic species. One is F. picea from Europe, another, F. candida, is currently known exclusively from Kyrgyzstan, while the third one, temporarily designated here as Formica sp., inhabits the easternmost part of Eurasia from China to Kamchatka. It is unknown how F. picea and Formica sp. are distributed in Siberia and whether their ranges intersect. Here we studied a sample of this complex from Siberia using mtDNA and found that their ranges overlap. The distribution of Formica sp. extends from the south of West Siberia, including Altai, to China, and the Russian Far East. No phylogeographic structure was detected, suggesting their recent dispersal from a single source. F. picea was found as far as East Siberia, but was relatively rare. While the European and West Siberian populations were genetically closely related, the specimens from Zabaykalsky Krai differed, suggesting a putative East Siberian refugium. We also determined that ecologically F. picea inhabits peat bogs in lowland areas and grassy communities above the tree line in the European mountains; in Altai, it is found in mountain steppes, while in Transbaikalia, in waterlogged areas along riverbanks. Formica sp. thrives in dry steppes and low riverbanks, but avoids bogs. Thus, F. picea and Formica sp. differ genetically, and have different distribution ranges, as well as habitat preferences. This supports the opinion that Formica sp. should be recognized as a distinct species.

БИОИНФОРМАТИКА И СИСТЕМНАЯ БИОЛОГИЯ

284-292 33
Abstract

The question on the reproducibility of evolutionary processes is primarily of fundamental importance; however, with the development of methods for modeling evolutionary processes on computer multilevel models, an answer to this question is necessary to clarify the status of the predictions obtained. Experimental obtaining of ensembles of evolutionary outcomes for subsequent statistical processing on real biological systems seems to be impracticable. At the same time, the results obtained on multilevel computer models are difficult to interpret due to their complexity and the dependence of modeling results on a variety of parameters. This work is aimed at identifying common properties of evolving systems using a simple heuristic model based on transparent general principles and ideas about the key properties of biological systems that are important for the evolutionary process. Agents undergoing evolutionary changes are recurrent neural networks with a well-defined structure, a given function, and a specific rule for modifying the structure in the direction of maximum fitness. A separate instance of a neural network formed during the evolutionary process is called neural network model object (NNMO). Computational experiments have been carried out to generate ensembles of NNMO structures performing a given function, and the patterns of NNMO distribution in the structural space have been analyzed. This analysis confirms the presence of functional symmetry in the structure of NNMOs performing the same function. An assessment of the stability and reproducibility of individual evolutionary trajectories has been carried out. It is shown that under certain constraints leading to a reduction of the complexity of the NNMO structure (analogous to a narrow environmental specialization), the final NNMO structures may be close, but not identical. This suggests an inaccurate reproduction of the evolution of the structure with functional equivalence. Nevertheless, it can be argued that in the general case, the very ability for evolutionary change is possible with the redundancy of the potential complexity of the structure over the functional complexity and automatically entails a multiplicity of evolutionary outcomes based on the fact that the same function can be implemented by different, but functionally invariant structures.

293-298 37
Abstract

The imperative to re-analyze existing public sequencing data is central to modern biology, driven by new hypotheses and advanced analytical methods. However, this effort is critically hampered by the profound heterogeneity of repository data, particularly the non-standardized, free-text descriptions of biological experiments. This lack of structural and semantic homogeneity prevents systematic search, integration, and comparative analysis, effectively locking away the full potential of accumulated datasets. Advances in Natural Language Processing (NLP) offer a pivotal pathway to overcome this bottleneck by transforming unstructured text into computable, homogeneous information. The integrated Entrez database system, maintained by the National Center for Biotechnology Information (NCBI), provides sophisticated programmatic access via an API to primary sequencing data and its associated metadata, including detailed experimental descriptions. This interface enables researchers to identify and retrieve relevant data through keyword searches, including those based on gene names, and to apply modern NLP techniques to transform textual metadata into structured information. The output is formatted data ready for integration into local databases, accompanied by a systematic list of links for downloading primary files. The Alembic software package offers a comprehensive and automated solution for the entire workflow. Designed as a locally deployable client-server system, Alembic incorporates state-of-the-art transformer-based AI algorithms for analyzing the biomedical text that accompanies sequencing data. Its core utilizes the openly available AIONER platform, which is built upon the PubMedBERT model trained on the PubMed repository, to ensure efficient and accurate recognition of biomedical named entities (e. g., genes, diseases). This provides users with structured and meaningful keyword search results. By delivering a curated list of datasets, Alembic streamlines the path from search to analysis. Researchers can efficiently identify high-value targets and obtain a complete package of metadata and primary data to construct a tailored local repository. This positions Alembic as a universal solution that overcomes the fragmented approach of existing tools, offering an integrated workflow for diverse public sequencing data.

ВЫСОКОПРОИЗВОДИТЕЛЬНОЕ СЕКВЕНИРОВАНИЕ

299-310 57
Abstract

RNA sequencing (RNA-seq) is a highly sensitive method for transcriptome analysis that allows simultaneous assessment of expression of thousands of genes and identification of expression patterns under various conditions. The existing variety of RNA-seq data formats, normalization methods, and approaches to statistical processing of results complicates comparison of data from different studies and reduces reproducibility of the analysis. This study presents an automated pipeline PipeSeq that combines standard steps of RNA-seq data processing: loading (SRA Toolkit), read alignment to the reference genome (HISAT2), transcript assembly (StringTie), transcript counting (FeatureCounts) and statistical analysis of differential gene expression under various experimental conditions (DESeq2). PipeSeq has a simple visual interface, supports multithreading, and generates ready-to-analyze gene expression heat maps, tables and graphs. The functionality of the pipeline is demonstrated on three sets of raw RNA-seq data from the green alga Chlamydomonas reinhardtii cells available in the NCBI SRA database. The data from these experiments were used to analyze the differential expression of C. reinhardtii genes encoding the GATA family transcription factors under different light cultivation conditions. The data obtained by in silico methods were verified by real-time reverse transcription polymerase chain reaction (RT-qPCR) for 12 GATA genes, which allowed us to hypothesize their functions and evaluate the correlation between the bulk (RNA-seq) and targeted (RT-qPCR) approaches. Our results showed that RNA-seq and RT-qPCR methods reveal similar directions of gene expression changes, but demonstrate differences in the effect size and sensitivity, which emphasizes the need for a combined use of the two approaches. Thus, the PipeSeq program is a tool for conducting a full cycle of bioinformatic analysis of RNA-seq data, additionally providing the opportunity to process RT-qPCR data and perform a comparative statistical analysis of the results obtained.

311-320 65
Abstract

The metagenomic approach based on high-throughput sequencing is becoming increasingly prevalent for the detection of viral infections in plants. This method allows us to study the species composition of viruses associated with the plant, including novel species, describe their population genetic structure, and develop genetic test systems for routine diagnostics. A metagenomic approach to phytosanitary monitoring can help to determine the cause of unknown plant diseases, which is particularly important for preventing the spread of pathogens, such as viruses. Furthermore, as it is impossible to eliminate plant viruses in field conditions, comprehensive diagnostics using high-throughput sequencing is becoming an effective tool for complying with quarantine regulations on the import of foreign material, as well as for producing high-quality local planting material. High-throughput sequencing is becoming more affordable every year, with both the instrumentation and analytical capacity improving. This review summarizes key approaches to analyzing plant virome using high-throughput sequencing. The analysis process, from sample collection to bioinformatic data processing, validation and interpretation, is described in detail. The features of sequencing platforms and the factors affecting sequencing quality, including contamination, are discussed. Three complementary approaches to processing bioinformatic data are described: mapping reads to reference viral sequences; assembling and annotating contigs; taxonomic classification of reads without assembly. The importance of carefully interpreting the results is emphasized, considering the bioinformatic analysis and the validation by molecular genetic methods. This review will be useful for both researchers and specialists who have no experience with high-throughput sequencing, and those who have used this method for other applications.

ЦИФРОВОЕ ФЕНОТИПИРОВАНИЕ

321-329 49
Abstract

Contemporary agrobiotechnology research increasingly relies on automated methods for capturing and interpreting morphophysiological and spectral plant characteristics – a field known as digital phenotyping. This approach aims to identify stable differences between genotypes cultivated under non-identical environmental conditions. We previously introduced StatFaRmer, an open-source tool that we further develop here for comprehensive analysis of temporal phenotypic datasets, with a primary focus on crops such as soybean (Glycine max). The tool implements automated data preprocessing procedures, including synchronization of timestamps across samples and removal of noise artifacts and outliers. These features are particularly relevant for multi-month experiments involving assessments of growth parameters, fluctuations in photosynthetic apparatus area, or other biometric indicators. Support for standardized data formats (XLSX, CSV) ensures compatibility with common phenotyping systems, simplifying cross-platform integration. Thus, the tool can integrate with widely used HTPP platforms (e.g., Traitmill, HyperAIxpert, Plant Accelerator), enabling data from diverse sources to be analyzed within a single pipeline. For soybean experiments, StatFaRmer provides customizable analysis of variance (ANOVA) with visualization of diagnostic parameters (normality of distribution, homogeneity of variances) and evaluation of effect significance between user-defined groups. An example application compares growth parameters across 20 soybean cultivars under controlled stress: the tool automatically aggregated data with uneven measurement frequencies (from 1 hour to 3 days), identified anomalies in hypocotyl elongation dynamics, and computed statistical significance between groups (p < 0.01).The tool has been tested on large-scale datasets (over 2,000 measurements per experiment). StatFaRmer is implemented as a Shiny-based web application, with step-by-step deployment guides for Windows and Linux. All processing stages – from raw data to final plots – are documented to ensure transparency and compliance with research reproducibility standards. Thus, StatFaRmer offers a specialized solution for statistical hypothesis testing in soybean digital phenotyping, reducing data preparation time and minimizing risks of error when handling non-stationary time series.

330-338 60
Abstract

It has been repeatedly shown that spike productivity is the main component of wheat yield. The main spike parameters related to productivity are size, the number of grains and spikelets per spike, and the presence or absence of awns. In modern genetic research, morphometric analysis of hundreds and thousands of spikes is required to determine the loci that control spike productivity traits. On the other hand, thousands of accessions in modern collections of wheat genetic resources need detailed description. These considerations motivate the development of digital technologies for describing spike traits in wheat, which can be achieved through image analysis methods. These methods allow for automated acquisition of trait values that can serve as the basis for digital plant collections. Here we propose an extended set of spike characteristics obtained both manually and through digital image analysis and present plant characterization. These data form the basis of the updated version of the SpikeDroidDB database (http://spikedroid.biores.cytogen.ru/). The digital description of the spike consists of two blocks. The block of uploaded data includes a description of the plant and contains five tables: collection; variety sample (year of cultivation (vegetation), sowing identifier, taxonomic information, etc.), planting site, and characteristics of the spike determined manually (length, width of frontal and lateral views, type and color of the spike, etc.) The block of extracted features includes spike characteristics obtained by digital phenotyping and contains six tables: characteristics of the spike outline in the image; characteristics of the quadrangle model, values of the color components of the spike, dominant colors of the spike, and texture characteristics of the spike in the image. The most illustrative and significant features of the spike have been identified, allowing for the formation of the spike digital certificate, which includes size, shape, and color features derived from the digital images. The features forming the digital certificate have been compared between two wheat species, T. aethiopicum and T. carthlicum. It is shown that the features of the digital certificate allow for a clear representation of the spike model and the identification of distinct parameters: colors of the spike and awns and roundness of the frontal view of the spike. The database interface has been supplemented with the ability to upload data on plant and spike characteristics, as well as their images, in the batch mode.



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2500-3259 (Online)