Preview

Vavilov Journal of Genetics and Breeding

Advanced search

Оригинальный русский текст (русскоязычную версию журнала)
см. на:
 
https://vavilovj-icg.ru

Vol 29, No 7 (2025)
View or download the full issue PDF
https://doi.org/10.18699/vjgb-25-120

FROM THE EDITOR

MICROBIAL GENETICS AND BIOTECHNOLOGY

913-924 128
Abstract

   DNA oxidation is one of the main types of damage to the genetic material of living organisms. Of the many dozens of oxidative lesions, the most abundant is 8-oxoguanine (8-oxoG), a premutagenic base that leads to G→T transversions during replication. Double-stranded DNA can conduct holes through the π system of stacked nucleobases. Such electron vacancies are ultimately localized at the 5’-terminal nucleotides of polyguanine runs (G-runs), making these positions characteristic sites of 8-oxoG formation. While such properties of G-runs have been studied in vitro at the level of chemical reactivity, the extent to which they can influence mutagenesis spectra in vivo remains unclear. Here, we have analyzed the nucleotide context of G-runs in a representative set of 62 high-quality prokaryotic genomes and in the human telomere-to-telomere genome. G-runs were, on average, shorter than polyadenine runs (A- runs), and the probability of a G-run being elongated by one nucleotide is lower than in the case of A-runs. The re presentation of T in the position 5’-flanking G-runs is increased, especially in organisms with aerobic metabolism, which is consistent with the model of preferential G→T substitutions at the 5’-position with 8-oxoG as a precursor. Conversely, the frequency of G and C is increased and the frequency of T is decreased in the position 5’-flanking A- runs. A biphasic pattern of G-run expansion is observed in the human genome: the probability of sequences longer than 8–9 nucleotides being elongated by one nucleotide increases significantly. An increased representation of C in the 5’-flanking position to long G-runs was found, together with an elevated frequency of 5’-G→A substitutions in telomere repeats. This may indicate the existence of mutagenic processes whose mechanism has not yet been charac terized but may be associated with DNA polymerase errors during replication of the products of further oxidation of 8-oxoG.

925-939 138
Abstract

   De novo motif search is the main approach for determining the nucleotide specificity of binding of the key regulators of gene transcription, transcription factors (TFs), based on data from massive genome-wide sequencing of their binding site regions in vivo, such as ChIP-seq. The number of motifs of known TF binding sites (TFBSs) has increased several times in recent years. Due to the similarity in the structure of the DNA-binding domains of TFs, many structurally cognate TFs have similar and sometimes almost indistinguishable binding site motifs. The classification of TFs by the structure of the DNA-binding domains from the TFClass database defines the top levels of the hierarchy (superclasses and classes of TFs) by the structure of these domains, and the next levels (families and subfamilies of TFs) by the alignments of amino acid sequences of domains. However, this classification does not take into account the similarity of TFBS motifs, whereas identification of valid TFs from massive sequencing data of TFBSs, such as ChIP-seq, requires working with TFBS motifs rather than TFs themselves. Therefore, in this study we extracted from the Hocomoco and Jaspar databases the TFBS motifs for human and fruit fly Drosophila melanogaster, and considered the pairwise similarity of binding site motifs of cognate TFs according to their classification from the TFClass database. We have shown that the common tree of the TF hierarchy by the structure of DNA-binding domains can be split into separate branches representing non-overlapping sets of TFs. Within each branch, the majority of TF pairs have significantly similar binding site motifs. Each branch can include one or more sister elementary units of the hierarchy and all its/their lower levels: one or more TFs of the same subfamily, or the whole subfamily, one or several subfamilies of the same family, an entire family, etc., up to the entire class. Analysis of the seven largest human and two largest Drosophila TF classes showed that the similarity of TFs in terms of TFBS motifs for different corresponding levels (classes, families) is noticeably different. Supplementing the hierarchical classification of TFs with branches combining significantly similar motifs of TFBSs can increase the efficiency of identifying involved TFs through enriched motifs detected by de novo motif search for massive sequencing data of TFBSs from the ChIP-seq technology.

940-951 120
Abstract

   The development of high-throughput sequencing has expanded the possibilities for studying the regulation of gene expression, including the reconstruction of gene regulatory networks and transcription factor regulatory networks (TFRNs). Identifying the molecular aspects for regulation of biological processes via these networks remains a challenge. Solving this problem for plants will significantly advance the understanding of the mechanisms shaping agronomically important traits. Previously, we developed the PlantReg program to reconstruct the transcriptional regulation of biological processes in the model species Arabidopsis thaliana L. The links established by this program between TFRNs and the genes regulating biological processes specify the type of regulation (activation/suppression). However, the program does not determine whether activation/suppression of the target gene is due to the cooperative or competitive interaction of transcription factors (TFs). We assumed that using information on the mutual arrangement of TF binding sites (BSs) in the target gene promoter as well as data on the activity type of TF effector domains would help to identify the cooperative/competitive action of TFs. We improved the program and created PlantReg 1.1, which enables precise localization of TF BSs in extended TF binding regions identified from genome-wide DAP-seq profiles (https://plamorph.sysbio.ru/fannotf/). To demonstrate the capabilities of the program, we used it to investigate the regulation of target genes in previously reconstructed TFRNs for auxin response and early reaction to salt stress in A. thaliana. The study focused on genes encoding proteins involved in chlorophyll and lignin biosynthesis, ribosome biogenesis, and abscisic acid (ABA) signaling. We revealed that the frequency of competitive regulation under the influence of auxin or salt stress could be quite high (approximately 30 %). We demonstrated that competition between bZIP family TFs for common BS is a significant mechanism of transcriptional repression in response to auxin, and that auxin and salt stress can engage common competitive regulatory mechanisms to modulate the expression of some genes in the ABA signaling pathway.

952-962 126
Abstract

   Since the work of Nobel Prize winner Thomas Morgan in 1909, the fruit fly Drosophila melanogaster has been one of the most popular model animals in genetics. Research using this fly was honored with the Nobel Prize many times: in 1946 (Muller, X-ray mutagenesis), in 1995 (Lewis, Nüsslein-Volhard, Wieschaus, genetic control of embryogenesis), in 2004 (Axel and Buck, the olfactory system), in 2011 (Steinman, dendritic cells in adaptive immunity; Beutler and Hoffman, activation of innate immunity), and in 2017 (Hall, Rosbash and Young, the molecular mechanism of the circadian rhythm). The prominent role of Drosophila in genetics is due to its key features: short life cycle, frequent generational turnover, ease of maintenance, high fertility, small size, transparent embryos, simple larval structure, the possibility to observe visually chromosomal rearrangements due to the presence of polytene chromosomes, and accessibility to molecular genetic manipulation. Furthermore, the highly conserved nature of several signaling pathways and gene networks in Drosophila and their similarity to those of mammals and humans, taken together with the development of high-throughput genomic sequencing, motivated the use of D. melanogaster as a model organism in biomedical fields of inquiry: pharmacology, toxicology, cardiology, oncology, immunology, gerontology, and radiobiology. These studies add to the understanding of the genetic and epigenetic basis of the pathogenesis of human diseases. This paper describes our curated knowledge base, FlyDEGdb (https://www.sysbio.ru/FlyDEGdb), which stores information on differentially expressed genes (DEGs) in Drosophila. This information was extracted from 50 scientific articles containing experimental data on changes in the expression of 20,058 genes (80 %) out of the 25,079 Drosophila genes stored in the NCBI Gene database. The changes were induced by 52 stress factors, including heat and cold exposure, dehydration, heavy metals, radiation, starvation, household chemicals, drugs, fertilizers, insecticides, pesticides, herbicides, and other toxicants. The FlyDEGdb knowledge base is illustrated using the example of the dysf (dysfusion) Drosophila gene, which had been identified as a DEG under cold shock and in toxicity tests of the herbicide paraquat, the solvent toluene, the drug menadione, and the food additive E923. FlyDEGdb stores information on changes in the expression of the dysf gene and its homologues: (a) the Clk, cyc, and per genes in Drosophila, and (b) the NPAS4, CLOCK, BMAL1, PER1, and PER2 genes in humans. These data are supplemented with information on the biological processes in which these genes are involved: oocyte maturation (oogenesis), regulation of stress response and circadian rhythm, carcinogenesis, aging, etc. Therefore, FlyDEGdb, containing information on the widely used model organism, Drosophila, can be helpful for researchers working in the molecular biology and genetics of humans and animals, physiology, translational medicine, pharmacology, dietetics, agricultural chemistry, radiobiology, toxicology, and bioinformatics.

SYSTEMS COMPUTATIONAL BIOLOGY

963-977 127
Abstract

   Hepatocellular Carcinoma (HCC) is the most common primary liver cancer characterized by rapid progres­ sion, high mortality rate and therapy resistance. One of the key areas in studying the molecular mechanisms of HCC development is the analysis of disturbances in apoptosis processes in hepatocytes. Throughout life apoptosis en­sures the elimination of old and defective cells while the attenuation of this process serves as one of the leading fac­tors in carcinogenesis. In this study we reconstructed and analyzed the gene network regulating hepatocyte apo­ptosis in humans based on single­cell transcriptome sequencing (scRNA­seq) data and the ANDSystem know ledge base which employs artificial intelligence and computational systems biology methods. Comparative analysis of gene expression revealed weakened transcription of genes involved in the regulation of inflammatory processes and apoptosis in tumor hepatocytes compared to hepatocytes of normal liver tissue. The reconstructed network included 116 differentially expressed genes annotated in Gene Ontology as genes involved in the apoptotic pro­cess (apoptotic process GO:0006915), along with their 116 corresponding protein products. It also included 16 ad­ditional proteins that, while lacking GO apoptosis annotation, were differentially expressed in HCC and interacting with genes and proteins participating in the apoptosis process. Computational analysis of the gene network identi­ fied several key protein products encoded by the genes NFKB1, MMP9, BCL2, A4, CDKN1A, CDK1, ERBB2, G3P, MCL1, FOXO1. These proteins exhibited both a high degree of connectivity with other network objects and differential ex­pression in HCC. Of particular interest are proteins CDKN1A, ERBB2, IL8, and EGR1, which are not annotated in Gene Ontology as apoptosis participants but have a statistically significant number of interactions with genes involved in apoptosis. This indicates their role in regulating programmed cell death. The obtained results can guide the design of new experiments studying the role of apoptosis in carcinogenesis and aid in the search for novel therapeutic targets and approaches for HCC therapy using apoptosis modulation in malignant hepatocytes. Furthermore, the proposed approach to reconstructing and analyzing the apoptosis regulation gene network in hepatocellular car­cinoma can be applied to analyze other tumor forms providing a systemic understanding of disturbances in key regulatory processes in oncogenesis and potential therapy targets.

978-989 125
Abstract

   The rapid advancement of omics technologies (genomics, transcriptomics, proteomics, metabolomics) and other high-throughput methods for experimental studies of molecular genetic systems and processes has led to the generation of an unprecedentedly vast amount of heterogeneous and complex biological data. Effective use of this information resource requires systematic approaches to its analysis. One such approach involves the creation of domain-specific knowledge/data repositories that integrate information from multiple sources. This not only enables the storage and structuring of heterogeneous data distributed across various resources but also facilitates the acquisition of new insights into biological systems and processes. A systematic approach is also critical to solving the fundamental problem of biology – clarifying the regularities of morphogenesis. Morphogenesis is regulated through evolutionarily conserved signaling pathways (Hedgehog, Wnt, Notch, etc.). The Hedgehog (HH) pathway plays a key role in this process, as it begins functioning earlier than others in ontogenesis and determines the progression of every stage of an organism’s life cycle: from structuring embryonic primordia, histo- and organogenesis, to maintaining tissue homeostasis and regeneration in adults. Our work presents HH_Signal_pathway_db, a knowledge base that integrates curated data on the molecular components and functional roles of the human Hedgehog (HH) signaling pathway. The first release of the database (available upon request at bukharina@bionet.nsc.ru) contains information on 56 genes, their protein products, the regulatory interaction network, and established associations with pathological conditions in humans. HH_Signal_pathway_db provides researchers with a tool for gaining new knowledge about the role of the Hedgehog pathway in health and disease, and its potential applications in developmental biology and translational medicine.

990-999 84
Abstract

   Macrophages are immune system cells that perform various, often opposing, functions in the organism depending on the incoming microenvironment signals. This is possible due to the plasticity of macrophages, which allows them to radically alter their phenotypic characteristics and gene expression profiles, as well as return to their original, non-activated state. Depending on the inductors acting on the cell, macrophages are activated into various functional states. There are five main phenotypes of activated macrophages: M1, M2a, M2b, M2c, and M2d. Although the amount of genome-wide transcriptomic and proteomic data showing differences between major macrophage phenotypes and non-activated macrophages (M0) is rapidly growing, questions regarding the mechanisms regulating gene and protein expression profiles in macrophages of different phenotypes still remain. We compiled lists of proteins associated with the macrophage phenotypes M1, M2a, M2b, M2c, and M2d (phenotype-associated proteins) and analyzed the data on potential mediators of macrophage polarization. Furthermore, using the computational system ANDSystem, we conducted a search and analysis of the relationships between potential regulatory proteins and the genes encoding the proteins associated with the M2 group phenotypes, obtaining estimates of the statistical significance of these relationships. The results indicate that the differences in the M2a, M2b, M2c, and M2d macrophage phenotypes may be attributed to the regulatory effects of the proteins JUN, IL8, NFAC2, CCND1, and YAP1. The expression levels of these proteins vary among the M2 group phenotypes, which in turn leads to different levels of gene expression associated with specific phenotypes.

1000-1008 115
Abstract

   Accumulated evidence links dysregulated cytokine signaling to the pathogenesis of autism spectrum disorder (ASD), implicating genes, proteins, and their intermolecular networks. This paper systematizes these findings using bioinformatics analysis and machine learning methods.

   The primary tool employed in the study was the AND-System cognitive platform, developed at the Institute of Cytology and Genetics, which utilizes artificial intelligence techniques for automated knowledge extraction from biomedical databases and scientific publications.

   Using AND-System, we reconstructed a gene network of cytokine-mediated regulation of autism spectrum disorder (ASD)-associated genes and proteins. The analysis identified 110 cytokines that regulate the activity, degradation, and transport of 58 proteins involved in ASD pathogenesis, as well as the expression of 91 ASD-associated genes. Gene Ontology (GO) enrichment analysis revealed statistically significant associations of these genes with biological processes related to the development and function of the central nervous system. Furthermore, topological network analysis and functional significance assessment based on association with ASD-related GO biological processes allowed us to identify 21 cytokines exerting the strongest influence on the regulatory network. Among these, eight cytokines (IL-4, TGF-β1, BMP4, VEGFA, BMP2, IL-10, IFN-γ, TNF-α) had the highest priority, ranking at the top across all employed metrics. Notably, eight of the 21 prioritized cytokines (TNF-α, IL-6, IL-4, VEGFA, IL-2, IL-1β, IFN-γ, IL-17) are known targets of drugs currently used as immunosuppressants and antitumor agents. The pivotal role of these cytokines in ASD pathogenesis provides a rationale for potentially repurposing such inhibitory drugs for the treatment of autism spectrum disorders.

1009-1019 142
Abstract

   Reconstruction and analysis of gene networks regulating biological processes are among the modern methodo­logical approaches for studying complex biological systems that ensure the vital activity of organisms. Thermoregulation is an important evolutionary acquisition of warm­blooded animals. Multiple physiological systems (nervous, cardiovas­ cular, endocrine, respiratory, muscular, etc.) are involved in this process, maintaining stable body temperature despite changes in ambient temperature.

   This study aims to perform a computer reconstruction of the human thermoregulation gene network and present the results in the Termo_Reg_Human 1.0 knowledge base.

   The gene network was reconstructed using the ANDSystem software and information system, designed for the automated extraction of knowledge and facts from scientific publications and biomedical databases based on machine learning and artificial intelligence methods. The Termo_Reg_Human 1.0 knowledge base (https://www.sysbio.ru/ThermoReg_Human/) contains information about the hu­man thermoregulation gene network, including a description of 469 genes, 473 proteins, and 265 microRNAs important for its functioning, interactions between these objects, and the evolutionary characteristics of the genes. Using the AND-Visio software tool (a module of AND-System), each gene, protein, and microRNA involved in the thermoregulation of the hu­man body was prioritized according to its functional significance, i. e., the number of interactions with other objects in the reconstructed gene network. It was found that the key objects with the largest number of functional interactions in the human thermoregulation gene network included the UCP1, VEGFA, PPARG and DDIT3 genes; STAT3, JUN, VEGFA, TLR4 and TNFA proteins; and the microRNAs hsa­mir­335 and hsa­mir­26b. We revealed that the set of 469 human genes from the network was enriched with genes whose ancestral forms originated at an early evolutionary stage (Unicellular organisms, the root of the phylostratigraphic tree) and at the stage of Vertebrata divergence.

1020-1030 111
Abstract

   Rheumatoid arthritis (RA) is a systemic autoimmune disease characterized primarily by joint involvement with progressive destruction of cartilage and bone tissue. To date, RA remains an incurable disease that leads to a significant deterioration in quality of life and patient disability. Despite a wide arsenal of disease-modifying antirheumatic drugs, approximately 40 % of patients show an insufficient response to standard treatment, highlighting the urgent need to identify new pharmacological targets.

   The aim of this study was to search for novel biological processes that could serve as promising targets for the targeted therapy of RA.

   To achieve this goal, we employed an approach based on the automated extraction of knowledge from scientific publications and biomedical databases using the ANDSystem software. This approach involved the reconstruction and subsequent analysis of two types of associative gene networks: a) gene networks describing genes and proteins associated with the development of RA, and b) gene networks describing genes and proteins involved in the functional responses to drugs used for the disease’s therapy. The analysis of the reconstructed networks identified 11 biological processes that play a significant role in the pathogenesis of RA but are not yet direct targets of existing disease-modifying antirheumatic drugs. The most promising of these, described by Gene Ontology terms, include: a) the Toll-like receptor signaling pathway; b) neutrophil activation; c) regulation of osteoblast differentiation; d) regulation of osteoclast differentiation; e) the prostaglandin biosynthetic process, and f) the canonical Wnt signaling pathway. The identified biological processes and their key regulators represent promising targets for the development of new drugs capable of improving the efficacy of RA therapy, particularly in patients resistant to existing treatments. The developed approach can also be successfully applied to the search for new targeted therapy targets for other diseases.

1031-1040 152
Abstract

   Mathematical models represent a powerful theoretical tool for studying complex biological systems. They provide an opportunity to track non-obvious interactions and conduct in silico experiments to address practical problems. Iron plays a key role in oxygen transport in the mammals. However, a high concentration of this microelement can damage cellular structures through the production of reactive oxygen species and can also lead to ferroptosis (programmed cell death associated with iron-dependent lipid peroxidation). The immune system contributes greatly to the regulation of iron metabolism – hypoferritinemia (decreased ferritin concentration in the blood) during infection – which is a result of the innate immune response. In the study of iron metabolism, many aspects of regulation remain insufficiently studied and require a deeper understanding of the structural-functional organization and dynamics of all components of this complex process in both normal and pathological conditions.

   Consequently, mathematical modeling becomes an important tool to identify key regulatory interactions and predict the behavior of the iron metabolism regulatory system in the human body under various conditions.

   This article presents a review of iron metabolism models applicable to humans presented in chronological order of their development to illustrate the evolution and priorities in modeling iron metabolism. We focused on the formulation of numerical problems in the analyzed models, their structure and reproducibility, thereby highlighting their advantages and drawbacks. Advanced models can numerically simulate various experimental scenarios: blood transfusion, signaling pathway disruption, mutation in the ferroportin gene, and chronic inflammation. However, existing mathematical models of iron metabolism are difficult to scale and do not account for the functioning of other organs and systems, which severely limits their applicability. Therefore, to enhance the utility of computational models in solving practical problems related to iron metabolism in the human body, it is necessary to develop a scalable and verifiable mathematical model of iron metabolism that considers interactions with other functional human systems (e. g., the immune system) and state-of-the-art standards for representing mathematical models of biological systems.

1041-1050 120
Abstract

   Identification of the connections between the various functional components of the immune system is a crucial task in modern immunology. It is key to implementing the systems biology approach to understand the mechanisms of dynamic changes and outcomes of infectious and oncological diseases. The data characterizing an individual’s immune status typically have a high-dimensional state space and a small sample size. To study the network topology of the immune system, we utilized previously published original data from Toptygina et al. (2023), which included measurements of the immune status in 19 healthy individuals (children, 9 boys and 10 girls, aged 1 to 2 years), i. e., the immune cells (42 subpopulations) obtained by flow cytometry; cytokine levels (13 types) obtained by multiplex analysis; and antibody levels (4 types) determined by using enzyme immunoassay. To correctly identify statistically significant correlations between the measured variables and construct the respective network graph, it is necessary to use an approach that takes into account the small size of the dataset. In this study, we implemented and analyzed an approach based on the regularized debiased sparse partial correlation (DSPC) algorithm to evaluate sparse partial correlations and identify the network structure of relationships in the immune system of healthy individuals (children) based on immune status data, which includes a set of indicators for subpopulations of immune cells, cytokine levels, and antibodies. For different levels of statistical significance, heatmaps of the partial correlations were constructed. The graph visualization of the DSPC networks was performed, and their topological characteristics were analyzed. It is found that with a limited measurements sample, the choice of a statistical significance threshold critically affects the structure of the partial correlations matrix. The final verification of the immunologically correct structure of the correlation-based network requires both an increase in the sample size and consideration of a priori mechanistic views and models of the functioning of the immune system components. The results of this analysis can be used to select the therapy targets and design combination therapies.

1051-1061 111
Abstract

   Vision plays a key role in the lives of various organisms, enabling spatial orientation, foraging, predator avoidance and social interaction. In species with relatively simple visual systems, such as insects, effective behavioral strategies are achieved through high neural specialization, adaptation to specific environmental conditions, and the use of additional sensory systems such as olfaction or hearing. Animals with more complex vision and nervous systems, such as mammals, have greater cognitive abilities and flexibility, but this comes with increased demands on the brain’s energy costs and computational resources. Modeling the features of such systems in a virtual environment could allow researchers to explore the fundamental principles of sensorimotor integration and the limits of cognitive complexity, as well as test hypotheses about the interaction between perception, memory and decision-making mechanisms.

   In this work, we implement and investigate a model of virtual organisms with a visual system operating in a three-dimensional physical environment using the Unity ML-Agents software – one of the most high-performance simulation platforms currently available.

   We propose a hierarchical control architecture that separates locomotion and navigation tasks between two modules: (1) visual perception and decision-making, and (2) coordinated control of limb movement for locomotion in the physical environment. A series of numerical experiments was conducted to examine the influence of visual system parameters (e. g, resolution of the “first-person” view), environmental configuration and agent architectural features on the efficiency and outcomes of reinforcement learning (using the PPO algorithm). The results demonstrate the existence of an optimal range of resolutions that provide a trade-off between computational complexity and success in accomplishing the task, while excessive dimensionality of sensory inputs or action space leads to slower learning. We performed system performance profiling and identified key bottlenecks in large-scale simulations. The discussion considers biological parallels, highlighting cases of high behavioral efficiency in insects with relatively low-resolution visual systems, and the potential of neuroevolutionary approaches for adapting agent architectures. The proposed approach and the results obtained are of potential interest to researchers working on biologically inspired artificial agents, evolutionary modeling, and the study of cognitive processes in artificial systems.

STRUCTURAL COMPUTATIONAL BIOLOGY

1062-1072 113
Abstract

   The ABH2 enzyme belongs to the AlkB-like family of Fe(II)/α-ketoglutarate-dependent dioxygenases. Various non-heme dioxygenases act on a wide range of substrates and have a complex catalytic mechanism involving α-ketoglutarate and an Fe(II) ion as a cofactor. Representatives of the AlkB family catalyze the direct oxidation of alkyl substituents in the nitrogenous bases of DNA and RNA, providing protection against the mutagenic effects of endogenous and exogenous alkylating agents, and also participate in the regulation of the methylation level of some RNAs. DNA dioxygenase ABH2, localized predominantly in the cell nucleus, is specific for double-stranded DNA substrates and, unlike most other human AlkB-like enzymes, has a fairly broad spectrum of substrate specificity, oxidizing alkyl groups of such modified nitrogenous bases as, for example, N 1-methyladenosine, N 3-methylcytidine, 1,N 6-ethenoadenosine and 3,N 4-ethenocytidine.

   To analyze the mechanism underlying the enzyme’s substrate specificity and to clarify the functional role of key active-site amino acid residues, we performed molecular dynamics simulations of complexes of the wild-type ABH2 enzyme and its mutant forms containing amino acid substitutions V99A, F124A and S125A with two types of DNA substrates carrying methylated bases N 1-methyladenine and N 3-methylcytosine, respectively.

   It was found that the V99A substitution leads to an increase in the mobility of protein loops L1 and L2 involved in binding the DNA substrate and changes the distribution of π-π contacts between the side chain of residue F102 and nitrogenous bases located near the damaged nucleotide. The F124A substitution leads to the loss of π-π stacking with the damaged base, which in turn destabilizes the architecture of the active site, disrupts the interaction with the iron ion and prevents optimal catalytic positioning of α-ketoglutarate in the active site. The S125A substitution leads to the loss of direct interaction of the L2 loop with the 5’-phosphate group of the damaged nucleotide, weakening the binding of the enzyme to the DNA substrate. Thus, the obtained data revealed the functional role of three amino acid residues of the active site and contributed to the understanding of the structural-functional relationships in the recognition of a damaged nucleotide and the formation of a catalytic complex by the human ABH2 enzyme.

1073-1083 135
Abstract

   We recently proposed a novel class of nucleic acid derivatives – phosphoramidate benzoazole oligonucleotides (PABAOs). In these compounds, one of the non­bridging oxygen atoms is replaced by a phosphoramidate N-­ben­zoazole group, such as benzimidazole, dimethylbenzimidazole, benzoxazole, or benzothiazole. Studies of the properties of these derivatives have shown that their use in PCR enhances the specificity and selectivity of the analysis. The study investigates the effect of phosphoramide N­-benzimidazole modification of DNA primers on their elongation by Taq DNA polymerase using molecular dynamics simulations. We examined perfectly matched primer­template com­plexes with modifications at positions one through six from the 3’­end of the primer. Prior experimental work demon­strated that the degree of elongation suppression depends on the modification position: the closer to the 3’­end, the stronger the inhibition, with maximal suppression observed for the first position, especially in mismatched complexes. Furthermore, incomplete elongation products were experimentally observed for primers modified at the fourth posi­tion. Our molecular dynamics simulations and subsequent analysis revealed the molecular mechanisms underlying the interaction of modified primers with the enzyme. These include steric hindrance that impedes polymerase progression along the modified strand and local distortions in the DNA structure, which explain the experimentally ob­served trends. We established that both different stereoisomers of the phosphoramidate groups and conformers of the phosphoramidate N­-benzimidazole moiety differentially affect the structure of the enzyme­substrate complex and the efficiency of Taq DNA polymerase interaction with the modified DNA complex. Modification of the first and second in­ ternucleoside phosphate from the 3’­end of the primer causes the most significant perturbation to the structure of the protein­nucleic acid complex. When the modification is located at the fourth phosphate group, the N­benzimidazole moiety occupies a specific pocket of the enzyme. These findings provide a foundation for the rational design of specific
DNA primers bearing modified N­-benzimidazole moieties with tailored properties for use in PCR diagnostics.

1084-1096 166
Abstract

   In recent years, artificial intelligence methods based on the analysis of heterogeneous graphs of biomedical networks have become widely used for predicting molecular interactions. In particular, graph neural networks (GNNs) effectively identify missing edges in gene networks – such as protein–protein interaction, gene–disease, drug–target, and other networks – thereby enabling the prediction of new biological relationships. To reconstruct gene networks, cognitive systems for automatic text mining of scientific publications and databases are often employed. One such AI-driven platform, ANDSystem, is designed for automatic knowledge extraction of molecular interactions and, on this basis, the reconstruction of associative gene networks. The ANDSystem knowledge base contains information on more than 100 million interactions among diverse molecular genetic entities (genes, proteins, metabolites, drugs, etc.). The interactions span a wide range of types: regulatory relationships, physical interactions (protein–protein, protein–ligand), catalytic and chemical reactions, and associations among genes, phenotypes, diseases, and more. In the present study, we applied attention-based graph neural networks trained on the ANDSystem knowledge graph to predict new edges between proteins and ligands and to identify potential ligands for the SARS-CoV-2 ORF3a protein. The accessory protein ORF3a plays an important role in viral pathogenesis through ion-channel activity, induction of apoptosis, and the ability to modulate endolysosomal processes and the host innate immune response. Despite this broad functional spectrum, ORF3a has been explored far less as a pharmacological target than other viral proteins. Using a graph neural network, we predicted five small molecules of different origins (metabolites and a drug) that potentially interact with ORF3a: N-acetyl-D-glucosamine, 4-(benzoylamino)benzoic acid, austocystin D, bictegravirum, and L-threonine. Molecular docking and MM/GBSA affinity estimation indicate the potential ability of these compounds to form complexes with ORF3a. Localization analysis showed that the binding sites of bictegravir and 4-(benzoylamino)benzoic acid lie in a cytosolic surface pocket of the protein that is solvent-exposed; L-threonine binds within the intersubunit cleft of the dimer; and austocystin D and N-acetyl-D-glucosamine are positioned at the boundary between the cytosolic surface and the transmembrane region. The accessibility of these binding sites may be reduced by the influence of the lipid bilayer. The binding energetics for bictegravirum were more favorable than for 4-(benzoylamino)benzoic acid (docking score −7.37 kcal/mol; MM/GBSA ΔG −14.71 ± 3.12 kcal/mol), making bictegravirum a promising candidate for repurposing as an ORF3a inhibitor.

1097-1108 173
Abstract

   Oncological diseases remain a leading cause of pathological mortality worldwide, making the development of anticancer drugs a critical focus in medicinal chemistry. A promising strategy to enhance therapeutic efficacy and reduce chemotherapy-induced toxicity involves the combined inhibition of DNA repair enzymes and topoisomerases. Of particular interest are minor-groove DNA ligands, which exhibit potent inhibition of DNA-dependent enzymes while having low toxicity and mutagenicity. A number of research groups, including ours, are developing inhibitors of DNA repair enzymes that act simultaneously on several targets: tyrosyl-DNA phosphodiesterase 1/2 (TDP1/TDP2), poly(ADP-ribose) polymerase 1 (PARP1)/TDP1, topoisomerase 1 (TOP1)/TDP1. Such bifunctional inhibitors are designed to resolve the problem of tumor cell resistance to known chemotherapy drugs and increase the effectiveness of the latter. In this study, we evaluated the inhibitory activity of 22 minor-groove DNA ligands – bis- and trisbenzimidazoles against four key repair enzymes: TDP1, TDP2, PARP1, and PARP2. Four series of dimeric compounds and their monomeric units were studied. The difference in inhibitory activity of dimeric bisbenzimidazoles depending on the structure of the compound and the enzyme is shown. Our findings reveal distinct structure-activity relationships, with monomeric and dimeric ligands exhibiting potent TDP1 inhibition at micromolar to submicromolar IC50 values (half-maximal inhibitory concentration). Notably, dimeric compounds from the DB2Py(n) and DB3P(n) series demonstrated superior TDP1 inhibition compared to their monomers. In contrast, all tested compounds showed negligible activity against the other three repair enzymes; so, the compounds demonstrate specificity to TDP1. It should be noted that in this work, in the experiments with TDP1 and TDP2, the effect of the tested compounds as narrow-groove ligands binding to DNA was excluded, and their direct effect on the enzyme was investigated. The results of molecular docking suggest the possibility of direct interaction of active compounds with the active center of TDP1. According to the results of modeling, the inhibitors are located in the binding region of the 3’-end of DNA in the active site of TDP1 and could form stable bonds with the catalytically significant TDP1 residues His263 and His493. These interactions probably provide the high inhibitory activity of the compounds observed in biochemical experiments.

ECOLOGICAL AND POPULATION GENETICS

1109-1121 110
Abstract

   One of the main goals of modern evolutionary biology is to understand the mechanisms that lead to the initial differentiation (primary divergence) of populations into groups with genetic traits.

   This divergence requires reproductive isolation, which prevents or hinders contact and the exchange of genetic material between populations. This study explores the potential for isolation based not on obvious geographical barriers, population distance, or ecological specialization, but rather on hereditary mechanisms, such as gene drift and flow and selection against heterozygous individuals. To this end, we propose and investigate a dynamic discrete-time model that describes the dynamics of frequencies and numbers in a system of limited populations coupled by migrations. We consider a panmictic population with Mendelian inheritance rules, one-locus selection, and density-dependent factors limiting population growth. Individuals freely mate and randomly move around a one-dimensional ring-shaped habitat. The model was verified using data from an experiment on the box population system of Drosophila melanogaster performed by Yu.P. Altukhov et al. With rather simple assumptions, the model explains some mechanisms for the emergence and preservation of significant genetic differences between subpopulations (primary genetic divergence), accompanied by heterogeneity in allele frequencies and abundances within a homogeneous area. In this scenario, several large groups of genetically homogeneous subpopulations form and independently develop. Hybridization occurs at contact sites, and polymorphism is maintained through migration from genetically homogeneous nearby sites. It was found that only disruptive selection, directed against heterozygous individuals, can sustainably maintain such a spatial distribution. Under directional selection, divergence may occur for a short time as part of the transitional evolutionary process towards the best-adapted genotype. Because of the reduced adaptability of heterozygous (hybrid) individuals and low growth rates in these sites (hybrid zones), gene flow between adjacent sites with opposite genotypes (phenotypes) is significantly impeded. As a result, the hybrid zones can become effective geographical barriers that prevent the genetic flow between coupled subpopulations.

EVOLUTIONARY BIOINFORMATICS

1122-1128 99
Abstract

   The nature of the last universal common ancestor (LUCA) of all living organisms remains a controversial issue in biology. There is evidence of both thermophilic and mesophilic LUCA origin. The increasing complexity of the cellular apparatus during the evolution from early life forms to modern organisms could have manifested itself in long-term evolutionary changes in the nucleotide composition of genetic sequences. This work is devoted to the identification of such trends in tRNA sequences. The results of an evolutionary analysis of single-nucleotide substitutions in tRNAs of 123 species from three domains – Bacteria, Archaea and Eukaryota – are presented. A universal vector of directed evolutionary change in tRNA sequences has been discovered, in which substitutions of guanine (G) to adenine (A) and cytosine (C) to uracil (U) occur more frequently than the reverse. The most striking asymmetry in the number of substitutions is observed in the following transitions: a) purine-to-purine, where G→A outnumbers A→G, b) pyrimidine-to-pyrimidine, where C→U outnumbers U→C, and c) purine-to-pyrimidine and vice versa, where G→U outnumbers U→G. As a result, tRNAs could lose “strong” three-hydrogen-bond complementary pairs formed by guanine and cytosine and fix “weak” two-hydrogen-bond complementary pairs formed by adenine and uracil. 16 out of 20 tRNA families are susceptible to the detected change in sequence composition, which corresponds to the significance level p = 0.006 according to the one-sided binomial test. The identified pattern indicates a high GC content in the common ancestor of modern tRNAs, supporting the hypothesis that the last universal common ancestor (LUCA) lived in a hotter environment than do most contemporary organisms.

MEDICAL BIOINFORMATICS

1129-1136 131
Abstract

   Major depressive disorder (MDD) is one of the most widespread mental illnesses, which necessitates the search for factors of increased predisposition to this disorder. Single nucleotide polymorphisms in genes of the brain’s neurotransmitter systems are often considered as molecular genetic markers of MDD. Indicators of individual single nucleotide variability in neurotransmitter genes are used to assess the risk of MDD before its symptomatology at the behavioral level. However, the predictive capabilities of analyzing genomic variations to assess the risk of depression are not yet sufficiently reliable and are complemented by behavioral and neurophysiological information about patients. Neurophysiological markers of MDD provide the most reliable estimates of the severity of pathological symptoms, but they reflect a person’s state at the time of examination, and not a predisposition to the occurrence of this pathological state and do not allow assessing the risk of its appearance in the future. Major depressive disorder is often accompanied by abnormalities in a person’s ability to control motor responses, including the ability to voluntary suppress inappropriate behavior. The “stop-signal paradigm” (SSP) is an experimental method for assessing the functional balance between the inhibitory and activation systems of the brain during targeted movements. Combined with EEG recording, this experimental method allows for the consideration of not only participants’ behavioral characteristics, such as speed or accuracy of responses, but also the brain’s neuro physiological features associated with behavior control.

   The objective of this study was to evaluate the relationship between EEG responses in the stop-signal paradigm and individual single nucleotide variability in candidate genes for MDD detection.

   Dimensionality in the original genetic and neurophysiological experimental data was reduced by principal component analysis (PCA) to subsequently detect an association between EEG response components recorded during the control of random motor responses and single nucleotide variations in genes, the variability of which is associated with MDD risk. Variability in these genes has been shown to be associated with the amplitude of brain responses under the conditions of test subjects using the PCA method. The results obtained can be used to develop systems for the early diagnosis of depression, identify individual patterns of impairment in the brain, select methods for correcting the disease and control the effectiveness of therapy.

1137-1144 134
Abstract

   Organismal aging is accompanied by the accumulation of senescent cells – damaged, non-functional cells that exhibit cell cycle arrest, resistance to apoptosis, metabolic dysfunction, and production of a wide range of pro-inflammatory substances. The age-related accumulation of these cells is associated with impaired tissue function, contributes to chronic inflammation (inflammaging), and promotes the development of various age-associated diseases. Conversely, the elimination of senescent cells restores tissue functions and positively affects overall metabolism. Under normal conditions, senescent cells are removed by the innate immune system; however, the efficiency of this process declines with age. The involvement of adaptive immunity and the role of T cells in the clearance of senescent cells remain poorly understood.

   The aim of this study was to identify alterations in local T cell immunity associated with the accumulation of senescent cells in human skin.

   The analysis was performed on publicly available single-cell RNA-sequencing data from skin biopsies, and the senescent status was assessed using the SenePy algorithm with Gaussian mixture models. It was found that the emergence of senescent cells occurs heterogeneously across cell types within the tissue. The accumulation of these cells is associated with alterations in the CD4+ to CD8+ T cell ratio, as well as with an increased abundance of regulatory T cells. Functional analysis revealed that these quantitative age-related shifts were accompanied by more pronounced activation of regulatory T cells together with features of anergy and exhaustion in CD8+ T cells, whereas functional changes in CD4+ T cells were heterogeneous. These findings underscore the importance of adaptive immunity in maintaining tissue homeostasis and suggest potential age-related dysfunction of tissue-resident T cells. Understanding the mechanisms underlying the interaction between adaptive immunity and senescent cells is crucial for the development of senolytic vaccines and other immunological approaches aimed at enhancing endogenous elimination of senescent cells.

1145-1154 114
Abstract

   In recent years, the rapid growth of sequencing data has exacerbated the problem of functional annotation of protein sequences, as traditional homology-based methods face limitations when working with distant homologs, making it difficult to accurately determine protein functions. This paper introduces the OrthoML2GO method for protein function prediction, which integrates homology searches using the USEARCH algorithm, orthogroup analysis based on OrthoDB version 12.0, and a machine learning algorithm (gradient boosting).

   A key feature of our approach is the use of orthogroup information to account for the evolutionary and functional similarity of proteins and the application of machine learning to refine the assigned GO terms for the target sequence.

   To select the optimal algorithm for protein annotation, the following approaches were applied sequentially: the k-nearest neighbors (KNN) method; a method based on the annotation of the orthogroup most represented in the k-nearest homologs (OG); a method of verifying the GO terms identified in the previous stage using machine learning algorithms. A comparison of the prediction accuracy of GO terms using the OrthoML2GO method with the Blast2GO and PANNZER2 annotation programs was performed on sequence samples from both individual organisms (humans, Arabidopsis) and a combined sample represented by different taxa. Our results demonstrate that the proposed method is comparable to, and by some evaluation metrics outperforms, these existing methods in terms of the quality of protein function prediction, especially on large and heterogeneous samples of organisms. The greatest performance improvement is achieved by combining information about the closest homologs and orthogroups with verification of terms using machine learning methods. Our approach demonstrates high performance for large-scale automatic protein annotation, and prospects for further development include optimizing machine learning model parameters for specific biological tasks and integrating additional sources of structural and functional information, which will further improve the method’s accuracy and versatility. In addition, the introduction of new bioinformatics tools and the expansion of the annotated protein database will contribute to the further improvement of the proposed approach.



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2500-3259 (Online)