Preview

Vavilov Journal of Genetics and Breeding

Advanced search

Оригинальный русский текст: https://vavilovj-icg.ru/2022-year/27-8/

 

Vol 26, No 8 (2022)
View or download the full issue PDF
https://doi.org/10.18699/VJGB-22-86

FROM THE EDITOR

SYSTEMS COMPUTATIONAL BIOLOGY

 
721-732 950
Abstract

A vascular system in plants is a product of aromorphosis that enabled them to colonize land because it delivers water, mineral and organic compounds to plant organs and provides effective communications between organs and mechanical support. Vascular system development is a common object of fundamental research in plant development biology. In the model plant Arabidopsis thaliana, early stages of vascular tissue formation in the root are a bright example of the self-organization of a bisymmetric (having two planes of symmetry) pattern of hormone distribution, which determines vascular cell fates. In the root, vascular tissue development comprises four stages: (1) specification of progenitor cells for the provascular meristem in early embryonic stages, (2) the growth and patterning of the embryo provascular meristem, (3) postembryonic maintenance of the cell identity in the vascular tissue initials within the root apical meristem, and (4) differentiation of their descendants. Although the anatomical details of A. thaliana root vasculature development have long been known and described in detail, our knowledge of the underlying molecular and genetic mechanisms remains limited. In recent years, several important advances have been made, shedding light on the regulation of the earliest events in provascular cells specification. In this review, we summarize the latest data on the molecular and genetic mechanisms of vascular tissue patterning in A. thaliana root. The first part of the review describes the root vasculature ontogeny, and the second reconstructs the sequence of regulatory events that underlie this histogenesis and determine the development of the progenitors of the vascular initials in the embryo and organization of vascular initials in the seedling root.

 
733-742 945
Abstract

Hepatitis C virus (HCV) is a risk factor that leads to hepatocellular carcinoma (HCC) development. Epigenetic changes are known to play an important role in the molecular genetic mechanisms of virus-induced oncogenesis. Aber rant DNA methylation is a mediator of epigenetic changes that are closely associated with the HCC pathogenesis and considered a biomarker for its early diagnosis. The ANDSystem software package was used to reconstruct and evaluate the statistical significance of the pathways HCV could potentially use to regulate 32 hypermethylated genes in HCC, including both oncosuppressor and protumorigenic ones identified by genome-wide analysis of DNA methylation. The reconstructed pathways included those affecting protein-protein interactions (PPI), gene expression, protein activity, stability, and transport regulations, the expression regulation pathways being statistically significant. It has been shown that 8 out of 10 HCV proteins were involved in these pathways, the HCV NS3 protein being implicated in the largest number of regulatory pathways. NS3 was associated with the regulation of 5 tumor-suppressor genes, which may be the evidence of its central role in HCC pathogenesis. Analysis of the reconstructed pathways has demonstrated that following the transcription factor inhibition caused by binding to viral proteins, the expression of a number of oncosuppressors (WT1, MGMT, SOCS1, P53) was suppressed, while the expression of others (RASF1, RUNX3, WIF1, DAPK1) was activated. Thus, the performed gene-network reconstruction has shown that HCV proteins can influence not only the methylation status of oncosuppressor genes, but also their transcriptional regulation. The results obtained can be used in the search for pharmacological targets to develop new drugs against HCV-induced HCC.

 
743-757 1439
Abstract

L-Valine is one of the nine amino acids that cannot be synthesized de novo by higher organisms and must come from food. This amino acid not only serves as a building block for proteins, but also regulates protein and energy metabolism and participates in neurotransmission. L-Valine is used in the food and pharmaceutical industries, medicine and cosmetics, but primarily as an animal feed additive. Adding L-valine to feed, alone or mixed with other essential amino acids, allows for feeds with lower crude protein content, increases the quality and quantity of pig meat and broiler chicken meat, as well as improves reproductive functions of farm animals. Despite the fact that the market for L-valine is constantly growing, this amino acid is not yet produced in our country. In modern conditions, the creation of strains-producers and organization of L-valine production are especially relevant for Russia. One of the basic microorganisms most commonly used for the creation of amino acid producers, along with Escherichia coli, is the soil bacterium Corynebacterium glutamicum. This review is devoted to the analysis of the main strategies for the development of L- valine producers based on C. glutamicum. Various aspects of L-valine biosynthesis in C. glutamicum are reviewed: process biochemistry, stoichiometry and regulation, enzymes and their corresponding genes, export and import systems, and the relationship of L-valine biosynthesis with central cell metabolism. Key genetic elements for the creation of C. glutamicum-based strains-producers are identified. The use of metabolic engineering to enhance L-valine biosynthesis reactions and to reduce the formation of byproducts is described. The prospects for improving strains in terms of their productivity and technological characteristics are shown. The information presented in the review can be used in the production of producers of other amino acids with a branched side chain, namely L-leucine and L-isoleucine, as well as D-pantothenate.

 
758-764 579
Abstract

Periodic processes of gene network functioning are described with good precision by periodic trajectories (limit cycles) of multidimensional systems of kinetic-type differential equations. In the literature, such systems are often called dynamical, they are composed according to schemes of positive and negative feedback between components of these networks. The variables in these equations describe concentrations of these components as functions of time. In the preparation of numerical experiments with such mathematical models, it is useful to start with studies of qualitative behavior of ensembles of trajectories of the corresponding dynamical systems, in particular, to estimate the highest likelihood domain of the initial data, to solve inverse problems of parameter identification, to list the equilibrium points and their characteristics, to localize cycles in the phase portraits, to construct stratification of the phase portraits to subdomains with different qualities of trajectory behavior, etc. Such an à priori geometric analysis of the dynamical systems is quite analogous to the basic section “Investigation of functions and plot of their graphs” of Calculus, where the methods of qualitative studies of shapes of curves determined by equations are exposed. In the present paper, we construct ensembles of trajectories in phase portraits of some dynamical systems. These ensembles are 2-dimensional surfaces invariant with respect to shifts along the trajectories. This is analogous to classical construction in analytic mechanics, i. e. the level surfaces of motion integrals (energy, kinetic moment, etc.). Such surfaces compose foliations in phase portraits of dynamical systems of Hamiltonian mechanics. In contrast with this classical mechanical case, the foliations considered in this paper have singularities: all their leaves have a non-empty intersection, they contain limit cycles on their boundaries. Description of the phase portraits of these systems at the level of their stratifications, and that of ensembles of trajectories allows one to construct more realistic gene network models on the basis of methods of statistical physics and the theory of stochastic differential equations.

 
765-772 731
Abstract

The article presents the results of a study aimed at finding covariates to account for the activity of implicit cognitive processes in conditions of functional rest of the subjects and during them being presented their own or someone else’s face in a joint analysis of EEG experiment data. The proposed approach is based on the analysis of the dynamics of the facial muscles of the subject recorded on video. The pilot study involved 18 healthy volunteers. In the experiment, the subjects were sitting in front of a computer screen and performed the following task: sequentially closed their eyes (three trials of 2 minutes each) and opened them (three trials of the same duration between periods of closed eyes) when the screen was either empty or when it was showing a video recording of their own face or the face of an unfamiliar person of the same gender as the participant. EEG, ECG and a video of the face were recorded for all subjects. In the work a separate subtask of the study was also addressed: validating a technique for assessing the dynamics of the subjects’ facial muscle activity using the recorded videos of the “eyes open” trials to obtain covariates that can be included in subsequent processing along with EEG correlates in neurocognitive experiments with a paradigm that does not involve the performance of active cognitive tasks (“resting-state conditions”). It was shown that the subject’s gender, stimulus type (screen empty or showing own/other face), trial number are accompanied by differences in facial activity and can be used as study-specific covariates. It was concluded that the analysis of the dynamics of facial activity based on video recording of “eyes open” trials can be used as an additional method in neurocognitive research to study implicit cognitive processes associated with the perception of oneself and other, in the functional rest paradigm.

 
773-779 595
Abstract

These days, the ability to predict the result of the development of the system is the guarantee of the successful functioning of the system. Improving the quality and volume of information, complicating its presentation, the need to detect hidden connections makes it ineffective, and most often impossible, to use classical statistical forecasting methods. Among the various forecasting methods, methods based on the use of artificial neural networks occupy a special place. The main objective of our work is to create a neural network that predicts the risk of depression in a person using data obtained using a motor control performance testing system. The stop-signal paradigm (SSP) is an experimental technique to assess a person’s ability to activate deliberate movements or inhibit movements that have become inadequate to external conditions. In modern medicine, the SSP is most commonly used to diagnose movement disorders such as Parkinson’s disease or the effects of stroke. We hypothesized that SSP could serve as a basis for detecting the risk of affective diseases, including depression. The neural network we are developing is supposed to combine such behavioral indicators as: the amount of missed responses, amount of correct responses, average time, the amount of correct inhibition of movements after stopsignal onset. Such a combination of indicators will provide increased accuracy in predicting the presence of depression in a person. The artificial neural network implemented in the work allows diagnosing the risk of depression on the basis of the data obtained in the stop-signal task. An architecture was developed and a system was implemented for testing motor control indicators in humans, then it was tested in real experiments. A comparison of neural network technologies and methods of mathematical statistics was carried out. A neural network was implemented to diagnose the risk of depression using stop-signal paradigm data. The efficiency of the neural network (in terms of accuracy) was demonstrated on data with an expert assessment for the presence of depression and data from the motor control testing system.

EVOLUTIONARY COMPUTATIONAL BIOLOGY

 
780-786 635
Abstract

Development of computer models imitating the work of the nervous systems of living organisms, taking into account their morphology and electrophysiology, is one of the important and promising branches of computational neurobiology. It is often sought to model not only the nervous system, but also the body, muscles, sensory systems, and a virtual three-dimensional physical environment in which the behavior of an organism can be observed and which provides its sensory systems with adequate data streams that change in response to the movement of the organism. For a system of hundreds or thousands of neurons, one can still hope to determine the necessary parameters and get the functioning of the nervous system more or less similar to that of a living organism – as, for example, in a recent work on the modeling of the Xenopus tadpole. However, of greatest interest, both practical and fundamental, are organisms that have vision, a more complex nervous system, and, accordingly, significantly more advanced cognitive abilities. Determining the structure and parameters of the nervous systems of such organisms is an extremely difficult task. Moreover, at the cellular level they change over time, these including changes under the influence of the streams of sensory signals they perceive and the life experience gained, including the consequences of their own actions under certain circumstances. Knowing the structure of the nervous system and the number of nerve cells forming it, at least approximately, one can try to optimize the initial parameters of the model through artificial evolution, during which virtual organisms will interact and survive, each under the control of its own version of the nervous system. In addition, in principle, the rules by which the brain changes during the life of the organism can also evolve. This work is devoted to the development of a neuroevolutionary simulator capable of performing simultaneous functioning of virtual organisms that have a visual system and are able to interact with each other. The amount of computational resources required for the operation of models of the physical body of an organism, the nervous system and the virtual environment was estimated, and the performance of the simulator on a modern desktop computing system was determined depending on the number of simultaneously simulated organisms.

 
787-797 1267
Abstract

Phospholipases A2 (PLA2) are capable of hydrolyzing the sn-2 position of glycerophospholipids to release fatty acids and lysophospholipids. The PLA2 superfamily enzymes are widespread and present in most mammalian cells and tissues, regulating metabolism, remodeling the membrane and maintaining its homeostasis, producing lipid mediators and activating inflammatory reactions, so disruption of PLA2-regulated lipid metabolism often leads to various diseases. In this study, 29 PLA2 genes in the human genome were systematically collected and described based on literature and sequence analyses. Localization of the PLA2 genes in human genome showed they are placed on 12 human chromosomes, some of them forming clusters. Their RVI scores estimating gene tolerance to the mutations that accumulate in the human population demonstrated that the G4-type PLA2 genes belonging to one of the two largest clusters (4 genes) were most tolerant. On the contrary, the genes encoding G6-type PLA2s (G6B, G6F, G6C, G6A) localized outside the clusters had a reduced tolerance to mutations. Analysis of the association between PLA2 genes and human diseases found in the literature showed 24 such genes were associated with 119 diseases belonging to 18 groups, so in total 229 disease/PLA2 gene relationships were described to reveal that G4, G2 and G7-type PLA2 proteins were involved in the largest number of diseases if compared to other PLA2 types. Three groups of diseases turned out to be associated with the greatest number of PLA2 types: neoplasms, circulatory and endocrine system diseases. Phylogenetic analysis showed that a common origin can be established only for secretory PLA2s (G1, G2, G3, G5, G10 and G12). The remaining PLA2 types (G4, G6, G7, G8, G15 and G16) could be considered evolutionarily independent. Our study has found that the genes most tolerant to PLA2 mutations in humans (G4, G2, and G7 types) belong to the largest number of disease groups.

 
798­-805 566
Abstract

It is generally accepted that during the domestication of food plants, selection was focused on their productivity, the ease of their technological processing into food, and resistance to pathogens and environmental stressors. Besides, the palatability of plant foods and their health benefits could also be subjected to selection by humans in the past. Nonetheless, it is unclear whether in antiquity, aside from positive selection for beneficial properties of plants, humans simultaneously selected against such detrimental properties as allergenicity. This topic is becoming increasingly relevant as the allergization of the population grows, being a major challenge for modern medicine. That is why intensive research by breeders is already underway for creating hypoallergenic forms of food plants. Accordingly, in this paper, albumin, globulin, and β­amylase of common wheat Triticum aestivum L. (1753) are analyzed, which have been identified earlier as targets for attacks by human class E immunoglobulins. At the genomic level, we wanted to find signs of past negative selection against the allergenicity of these three proteins (albumin, globulin, and β­amylase) during the domestication of ancestral forms of modern food plants. We focused the search on the TATA­binding protein (TBP)­binding site because it is located within a narrow region (between positions –70 and –20 relative to the corresponding transcription start sites), is the most conserved, necessary for primary transcription initiation, and is the best­studied regulatory genomic signal in eukaryotes. Our previous studies presented our publicly available Web service Plant_SNP_TATA_Z­tester, which makes it possible to estimate the equilibrium dissociation constant (KD) of TBP complexes with plant proximal promoters (as output data) using 90 bp of their DNA sequences (as input data). In this work, by means of this bioinformatics tool, 363 gene promoter DNA sequences representing 43 plant species were analyzed. It was found that compared with non­food plants, food plants are characterized by significantly weaker affinity of TBP for proximal promoters of their genes homologous to the genes of commonwheat globulin, albumin, and β­amylase (food allergens) (p< 0.01, Fisher’s Z­test). This evidence suggests that in the past humans carried out selective breeding to reduce the expression of food plant genes encoding these allergenic proteins.

COMPUTATIONAL GENOMICS

 
806-809 757
Abstract

The development of next generation sequencing (NGS) methods has created the need for detailed analysis and control of each protocol step. NGS library preparation protocols may include steps with incorporation of various service sequences, such as sequencing adapters, primers, sample-, cell-, and molecule-specific barcodes. Despite a fairly high level of current knowledge, during the protocol development process researches often have to deal with various kinds of unexpected experiment outcomes, which result either from lack of information, lack of knowledge, or defects in reagent manufacturing. Detection and analysis of service sequences, their distribution and linkage may provide important information for protocol optimization. Here we introduce FastContext, a tool designed to analyze NGS read structure, based on sequence features found in reads, and their relative position in the read. The algorithm is able to create human readable read structures with user-specified patterns, to calculate counts and percentage of every read structure. Despite the simplicity of the algorithm, FastContext may be useful in read structure analysis and, as a result, can help better understand molecular processes that take place at different stages of NGS library preparation. The project is open-source software, distributed under GNU GPL v3, entirely written in the programming language Python, and based on well-maintained packages and commonly used data formats. Thus, it is cross-platform, may be patched or upgraded by the user if necessary. The FastContext package is available at the Python Package Index (https://pypi. org/project/FastContext), the source code is available at GitHub (https://github.com/regnveig/FastContext). 

 
810-818 1407
Abstract

Many plants and animals have symbiotic relationships with microorganisms, including bacteria. The interactions between bacteria and their hosts result in different outcomes for the host organism. The outcome can be neutral, harmful or have beneficial effects for participants. Remarkably, these relationships are not static, as they change throughout an organism’s lifetime and on an evolutionary scale. One of the structures responsible for relationships in bacteria is O-antigen. Depending on the characteristics of its components, the bacteria can avoid the host’s immune response or establish a mutualistic relationship with it. O-antigen is a key component in Gram-negative bacteria’s outer membrane. This component facilitates interaction between the bacteria and host immune system or phages. The variability of the physical structure is caused by the genomic variability of genes encoding O-antigen synthesis components. The genes and pathways of O-polysaccharide (OPS) synthesis were intensively investigated mostly for Enterobacteriaceae species. Considering high genetic and molecular diversity of this structure even between strains, these findings may not have caught the entire variety possibly presented in non-model species. The current study presents a comparative analysis of genes associated with O-antigen synthesis in bacteria of the Oxalobacteraceae family. In contrast to existing studies based on PCR methods, we use a bioinformatics approach and compare O- anti gens at the level of clusters rather than individual genes. We found that the O-antigen genes of these bacteria are represented by several clusters located at a distance from each other. The greatest similarity of the clusters is observed within individual bacterial genera, which is explained by the high variability of O-antigens. The study describes similarities of OPS genes inherent to the family as a whole and also considers individual unique cases of O-antigen genetic variability inherent to individual bacteria.

 
819-825 493
Abstract

MicroRNAs (miRNAs) are small non-coding RNAs that regulate gene expression at the post-transcriptional level in the cytoplasm and play an important role in a wide range of biological processes. Recent studies have found that the miRNA sequences are presented not only in the cytoplasm, but also in the mitochondria. These miRNAs (the so-called mitomiRs) may be the sequences of nuclear or mitochondrial origin; some of them are involved in regulation of the mitochondrial gene functions, while the role of others is still unknown. The identification of nucleotide signals, which are unique to mitomiRs, may help to determine this role. We formed a dataset that combined the experimentally discovered mitomiRs in human, rat and mouse. To isolate signals that may be responsible for the mitomiRs’ functions or for their translocation from or into mitochondria a context analysis was carried out for the sequences.  For three species in the group mitomiRs/non-mitomiRs and the group of all miRNAs from the miRBase database statistically overrepresented 8-letter motifs were identified (p-value < 0.01 with Bonferroni correction for multiple comparisons), for these motifs the patterns of the localization in functionally important regions for different types of miRNAs were found. Also, for the group mitomiRs/non-mitomiRs we found the statistically significant features of the miRNA nucleotide context near the Dicer and Drosha cleavage sites (Pearson’s χ2 test of independence for the first three positions of the miRNA, p-value < 0.05). The observed nucleotide frequencies may indicate a more homogeneous pri-miRNA cleavage by the Drosha complex during the formation of the 5’ end of mitomiRs. The obtained results can help to determine the role of the nucleotide signals in the origin, processing, and functions of the mitomiRs.

 
826-829 359
Abstract

Many scientific articles became available in the digital form which allows for querying articles data, and specifically the automated metadata gathering, which includes the affiliation data. This in turn can be used in the quantitative characterization of the scientific field, such as organizations identification, and analysis of the co-authorship graph of those organizations to extract the underlying structure of science. In our work, we focus on the miRNA science field, building the organization co-authorship network to provide the higher-level analysis of scientific community evolution rather than analyzing author-level characteristics. To tackle the problem of the institution name writing variability, we proposed the k-mer/n-gram boolean feature vector sorting algorithm, KOFER in short. This approach utilizes the fact that the contents of the affiliation are rather consistent for the same organization, and to account for writing errors and other organization name variations within the affiliation metadata field, it converts the organization mention within the affiliation to the K-Mer (n-gram) Boolean presence vector. Those vectors for all affiliations in the dataset are further lexicographically sorted, forming groups of organization mentions. With that approach, we clustered the miRNA field affiliation dataset and extracted unique organization names, which allowed us to build the co-authorship graph on the organization level. Using this graph, we show that the growth of the miRNA field is governed by the small-world architecture of the scientific institution network and experiences power-law growth with exponent 2.64 ± 0.23 for organization number, in accordance with network diameter, proposing the growth model for emerging scientific fields. The first miRNA publication rate of an organization interacting with already publishing organization is estimated as 0.184 ± 0.002 year–1. 



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2500-3259 (Online)