Preview

Vavilov Journal of Genetics and Breeding

Advanced search

Оригинальный русский текст: https://vavilovj-icg.ru/2015-year/19-6/

Vol 19, No 6 (2015)
View or download the full issue PDF (Russian)

INSECT GENETICS

Ontologies

 
652-660 1117
Abstract

Computer simulation is now becoming a central scientific paradigm of systems biology and the basic tool for the theoretical study and understanding of the complex mechanisms of living systems. The increase in the number and complexity of these models leads to the need for their collaborative development, reuse of models, and their verification, and the description of the computational experiment and its results. Ontological modeling is used to develop formats for knowledge-oriented mathematical modeling of biological systems. In this sense, ontology associated with the entire set of formats, supporting research in systems biology, in particular, computer modeling of biological systems and processes can be regarded as a first approximation to the ontology of systems biology. This review summarizes the features of the subject area (bioinformatics, systems biology, and biomedicine), the main motivation for the development of ontologies and the most important examples of ontological modeling and semantic analysis at different levels of the hierarchy of knowledge: the molecular genetic level, cellular level, tissue levels of organs and the body. Bioinformatics and systems biology is an excellent ground for testing technologies and efficient use of ontological modeling. Several dozens of verified basic reference ontologies now represent a source of knowledge for the integration and development of more complex domain models aimed at addressing specific issues in biomedicine and biotechnology. Further formalization and ontological accumulation of knowledge and the use of formal methods of analysis can take the entire cycle of research in systems biology to a new technological level.

Genomics and Polymorphism Analysis

 
661-667 772
Abstract

Various methods for identification of significant contextual signals are widely used to search for transcription factor binding sites and to identify the structural and functional organization of regulatory regions. These methods do not require any pre-alignment of the sample sequences analyzed or experimental information about the exact location of transcription factor binding sites. Methods of searching for contextual signals, based on the identification of degenerate oligonucleotide motives recorded in the 15-letter IUPAC code have become widespread. An essential problem with degenerate motifs is their great diversity, which makes the researchers apply heuristics which do not guarantee that the most significant signal will be found. The development of high-performance computing systems based on the use of graphics cards has made it possible to use the exact exhaustive methods to identify significant motifs. We have developed a new system for identifying significant degenerate oligonucleotide motifs of a given length in the regulatory regions based on the use of widespread graphics cards that provides a search for the signal with the greatest significance. High efficiency of the GPU compared with CPU was demonstrated. Using the proposed approach, we analyzed the regulatory regions of B. subtilis, E. coli, H. pylori, M. gallisepticum, M. genitalium and M. pneumoniae genes. Sets of degenerate motifs have been identified for each species of prokaryotes. They were classified on the basis of similarity with the transcription factor binding sites of E. coli.

 
668-674 783
Abstract

We have investigated a mutation frequency within the human genome for the set of known single nucleotide polymorphisms (SNPs) from the “1000 genomes” project. We have developed and applied novel statistical computational methods to analyze genetic text based on its complexity. A complexity profiling in a sliding window is applied to the sites containing single nucleotide polymorphisms within the human genome. A local decrease in text complexity level in SNP-containing sites has been shown. Analysis of the complexity profiles for SNPcontaining sites shows that flanking monomer repeats define a lower context complexity of sites containing SNPs within the human genome. An effect of local decrease in text complexity in SNP-containing sites is confirmed by analysis of polymorphisms in the rat and mouse genomes. We have found context differences between coding and regulatory sequences. These differences reflect a complexity of SNP-containing loci. The changes in point mutation frequency were shown previously for microsatellite containing sequences. Using enhanced mathematical tools and larger data sets this work shows enrichment of polytracks and simple sequence repeats in local genome surroundings of SNP containing sites. We have found high-frequency oligonucleotides within genomic regions containing SNPs. Such oligonucleotides are related to nucleotide polytracks. The presence of poly-A tracks might be associated with an increased probability of double helix DNA breaks around mutable loci and following fixation of nucleotide changes. The complexity estimates were computed using a previously developed program tool. This tool allows for both (i) complexity estimation of phased samples, and (ii) rapid and effective identification of the frequency spectrum of oligonucleotides with fixed lengths, and a comparison of oligonucleotide frequencies in different samples

 
675-681 700
Abstract

The high-throughput sequencing project “1 000 Genomes” made it possible to catalog and utilize genetic loci and single nucleotide polymorphisms (SNPs) in medicine. Analysis of SNP markers (significantly frequent differences of individual genomes of patients from the reference human genome) allows physicians to optimize treatment. On the other hand, tens of millions of unannotated SNPs correspond to a gigantic number of false positive (false negative) candidate SNP markers that are selected by computer methods for comparison of their frequency in patients with that in healthy people. This approach contributes to undervaluation of clinically relevant SNPs and to unnecessary computational expenses (on verification of neutral SNPs). Preclinical empirical verification of possible candidate SNP markers may eliminate neutral SNPs from the dataset. In the present study, we found, using the SNP_TATA_Comparator web service, the unannotated SNP rs367781716: the substitution of ancestral T (health) with minor C at position –37 before the transcription initiation site of the АВСА9 gene. This SNP significantly reduces affinity of TATAbinding protein (TBP) for this gene’s promoter and corresponds to a deficiency (low protein level) of the АВСА9 gene product (the transporter ATP-binding cassette A9) in patients with the –37C allele. For preclinical empirical verification of rs367781716, we used an electrophoretic mobility shift assay (EMSA) to measure the rates of formation (ka) and decay (kd) of the complexes of TBP with an oligonucleotide matching either allele –37C or –37T of the АВСА9 gene. We found that the rate of formation (ka) of the TBP/TATA complex for the minor allele is 2.4-fold lower than that for the ancestral allele. We calculated the empirical value of the change in the equilibrium constant of dissociation (KD = kd /ka), which characterizes binding affinity of TBP for a promoter containing the ТАТА box. This empirical value matched the value predicted by SNP_ТАТА _Comparator within the margin of error of the measurements and calculations. We also determined the half-life and Gibbs free energy of the complex of TBP with the АВСА9 promoter. Possible phenotypic manifestations of the candidate SNP marker rs367781716 are discussed.

 
682-690 604
Abstract

Genetic variability in the genes of circadian clock is manifested as the phenotypic variability of physiological functions and behavior as well as disorders of the function of not only the clock but also other systems, leading to the development of a pathologies. We analyzed the influence of SNPs localized in the [–70, –20] region from the transcription start site of the gene on TBP / promoter affinity in two groups of genes that are components of the system of human circadian clock. The first group comprises the genes of the circadian oscillator core (11 genes); the second, the genes of the nearest regulatory environment of the circadian oscillator (21 genes). A group for comparison included genes with another function (31 genes). The SNP_TATA_Comparator web service was used for prediction of the effect of SNPs in the regions of positioning of RNA polymerase II on the dissociation constant for TBP / promoter. It was shown that the number of SNP markers reducing the TBP / promoter affinity in the first group of genes significantly lower than the number of SNP markers increasing affinity (α < 10–3). The reverse was true of the comparison group: SNP markers reduced TBP / promoter affinity to a significantly greater extent than the SNP marker increased affinity (α < 10–6). This property may be a characteristic feature of genes  of the circadian oscillator. These predictions are important for identification of candidate SNP markers of various pathologies associated with the dysfunction of circadian clock genes for further testing them in experimental and clinical studies, as well as for verification of mathematical models of the circadian oscillator.

 
691-698 706
Abstract

Computational analysis of millions of unannotated SNPs from the 1000 Genomes Project may speed up the search for biomedical SNP markers. We combined the analysis of SNPs in the binding sites of ТАТА - binding protein (ТВР) using a previously described W eb service (http://beehive.bionet.nsc.ru/cgi-bin/mgs/ tatascan/start.pl) with a keyword search for biochemical
markers of chronopathologies, which correspond to clinical manifestations of these SNPs. In the [–70; –20] region of promoters of 14 human genes (location of proven binding sites of ТВР), we found 32 known and candidate SNP markers of circadian- rhythm disturbances, including rs17231520 and rs569033466 (both: risk of chronopathologies in liver); rs35036378 (behavioral chronoaberrations); rs549858786 (rheumatoid arthritis with a chronoaberration of IL1B expression); rs563207167, rs11557611, and rs5505 (all three: chronopathologies of the tumor – host balance, blood pressure, and the reproductive system); rs1143627 (bipolar disorder with circadian dependence of diagnosis and treatment); rs16887226 and rs544850971 (both: lowered resistance to endotoxins because of the imbalance between the circadian and immune systems); rs367732974 and rs549591993 (both: circadian dependence of heart attacks); rs563763767 (circadian dependence of myocardial infarction); rs2276109 and rs572527200 (both: circadian dependence of asthma attacks); rs34223104, rs563558831, and rs10168 (circadian optima of treatment with methotrexate and cyclophosphamide); and rs397509430, rs33980857,rs34598529, rs33931746, rs33981098, rs34500389, rs63750953, rs281864525, rs35518301, and rs34166473 (all: neurosensory hearing loss and restless legs syndrome). For these SNPs, we evaluated α (significance) of changes in the affinity of ТВР for promoters, where increased affinity corresponds to overexpression of the genes, and decreased affinity to deficient expression (Z-test). Verification of these 32 SNP markers according to clinical standards and protocols may advance the field of predictive preventive personalized medicine.

 
699-706 715
Abstract

Studies of wild and laboratory animals have revealed a trade-off between reproductive success and immunity. Therefore, it is likely that domestication favored selection of individuals with high reproductive performance but low immunity. The low responsiveness of the immune system could become hereditary through fixation of genes with “unfavorable” mutations in populations. The objectives of this work are: 1) determination of frequencies of genotypes and alleles of the rs340283541 SNP in the gene for the lymphotoxin beta (LTB) cytokine in pigs of domestic breeds and wild boars; 2) investigation of the expression of LTB mRNA in minipigs with different genotypes, and 3) bioinformational analysis of the putative functional role of the SNP. The frequency of the GG genotype in the wild boar sample was significantly lower than in the pooled sample of domestic pigs. The LTB mRNA expression rate in the lymph node of minipigs with genotype GG tended to increase (p < 0.06) in comparison with carriers o allele A. The rs340283541 SNP occurs in a DNA motif highly conservative among 11 mammalian species; thus, it may be of functional significance. Context analysis shows that allele A has putative binding sites for  transcription factors BRN-2 and AP-1, whereas allele G has binding sites for transcription factors RFX1, ISGF3 (site ISRE), and USF expressed in cells of the immune system. Thus, pig domestication was accompanied by an increase in the frequency of the GG genotype for the rs340283541 SNP, occurring in the 3’ region of the LTB gene. It is likely that the GG genotype is associated with elevated LTB mRNA expression in the lymph node tissue. This increase may be related to the formation of binding sites for RFX1, ISRE, and USF and/or disruption of binding sites for BRN-2 and AP-1. A linkage disequilibrium between rs340283541 and another functionally significant mutation in LTB is also conceivable.

Plant Bioinformatics

 
707-714 1044
Abstract

The shortage of polymorphic markers for the regions of wheat chromosomes that encode commercially valuable traits determined the need for studying wheat microsatellite loci. In this work, SSR markers for individual regions in the short arm of bread wheat chromosome 5B (5BS) were designed based on sequencing data for BAC clones, and the regions of the corresponding chromosome were saturated with these markers. Totally, 130 randomly selected BAC clones from the 5BS library were sequenced on the Ion Torrent platform and assembled in contigs using MIRA software. The assembly characteristics (N50 = 4 136 bp) are comparable to the recently obtained data for wheat and relative species and acceptable for identification of microsatellite loci. An algorithm utilizing the properties of complexity decompositions in  he sliding-window mode was used to detect DNA sequences with a repeat unit of 2–4 bp. Analysis of 17 770 contigs with the total length of 25 879 921 bp allowed for designing 113, 79, and 67 microsatellite (SSR) loci with a repeat unit of 2, 3, and 4 bp, respectively. The SSR markers with a motif of 3 bp were tested using nullitetrasomic lines of Chinese Spring wheat homoeologous group 5. Thus, 21 markers specific for chromosome 5B were detected. Eight of these markers were mapped to the distal region of this chromosome (bin 5BS6) using a set of Chinese Spring deletion lines for 5BS. Eight and four markers were mapped to the interstitial region (bins 5BS5 and 5BS4, respectively). One marker was mapped to a pericentromeric bin. A comparative analysis of the distribution of trinucleotide microsatellites over wheat chromosome 5B and in different cereal species suggests that the (AAG)n repeat has proliferated and has been maintained during the evolution of cereals.

 
715-723 8752
Abstract

A huge variety of phytopathogens (viruses, bacteria, fungi) are potentially able to infect plant tissues and cause diseases. Numerous plant genes control a complex network of defense mechanisms based on both constitutive and inducible processes. The cell wall is a primary barrier the pathogens have to penetrate to start the infection process. However,it is able to block invasion by most non-specific potential pathogens. The cell wall structure may differ in various plant species. It is based on the net of cellulose microfibrils linked by hemicellulose molecules. Pectin and lignin are the other important cell wall constituents. Dozens of proteins inside the cell wall are involved in structural and metabolic processes as well as in signal transduction and regulatory circuits (more information is available in W allProtDB database). Each of these components contributes to resistance to pathogens. At the points of contact with potential pathogens cell wall structural changes and accumulation of metabolites with antimicrobial, antifungal or antiviral activities occur. Some pathogens could produce hydrolytic enzymes able to degrade cellulose and pectin to counteract these non-specific plant resistance mechanisms. In turn, plants developed the inhibitors of pathogen-related enzymes and this “arms race” is an important part of plant evolution and host-pathogen interaction mechanisms. Plants also can evaluate the cell wall state to compensate for imbalances and deficiencies. For instance, mutants with cellulose deficiency may have a higher lignification rate and a stronger stress response. The cell wall is also a source of signal molecules triggering the initiation of response mechanisms. In total, the plan cell wall is a complex dynamic structure able to prevent infection by most potential (non-specific) pathogens and switch on the mechanisms of plant immune response. The reconstruction of gene networks controlling the cell wall structural and functional organization during the growth, and under normal and stressful conditions is vitally important for understanding the basic molecular mechanisms of development and stress resistance. The mechanisms of specific and non- specific plant resistance to various phytopathogens connected to the cell wall structure are reviewed. The roles of the cell wall constituents in pathogen detection and the induction of defense mechanism are discussed

Computer Simulation

 
724-730 693
Abstract

CD95 is one of the best studied members of the death receptor family. Activation of CD95 leads to the induction of the cell death programme, apoptosis, via formation of the death-inducing signaling complex (DISC). FA DD is a key adaptor protein for the formation of the C D95 DISC and activation of caspase-8 in the receptor complex. FA DD comprises the death domain and the death effector domain (DED). The death domain is essential for the interactions of FA DD with CD95, while DED is necessary for the recruitment of procaspase-8, -10 and the protein c-FLIP into the DISC. The search for the inhibitors that would block the interactions of FA DD with the other core proteins of the DISC is essential for the studies of the structure and function of this complex, investigation of the apoptosis mechanisms and development of new treatments for neurodegenerative diseases. In the course of this work, the screening for small inhibitors in silico that selectively interact with DED has been performed. For this purpose, the molecular modeling of the protein complexes and virtual screening of the potential inhibitors of FA DD has been performed. In addition, a new technology to test the activity of these inhibitors has been developed. The computational and experimental analysis performed allowed us to characterize the optimal conformation of the FA DD protein for the design of the small molecules that can bind in the region of amino acid residue Y25. We presume that further optimization of the structures of chemical compounds that can bind with the hydrophobic pocket next to the residue Y25 of FA DD will allow for the creation of the new perspective inhibitors of the programmed cell death.

 
731-737 590
Abstract

Identification of new effective inhibitors of apoptosis is an important task for drug development for treatment of a number diseases including neurogenerative diseases. Initiation of apoptosis occurs via the formationof macromolecular protein complexes. In these complexes, activation of key enzymes in apoptosis, caspases, takes place. One of those macromolecular complexes is DISC (death- inducing signaling complex) playing a central role in the induction of the extrinsic apoptosis pathway. The adaptor protein FA DD has a major role in the formation of the DISC. Therefore, inhibitors of FA DD, preventing its function in the DISC, can act as potential drugs inhibiting apoptosis. Furthermore, the study of the mechanisms of action of these inhibitors is of great interest for understanding the mechanisms of the signal transduction pathways of apoptosis. It has been reported that a natural protein inhibitor of FA DD is mucin-type 1 glycoprotein (MUC1). In particular, two fragments of the primary structure of the cytoplasmic domain of MUC1 (MUC1- CD) are capable of inhibiting the binding of caspase-8 to FA DD. However, the three-dimensional structure of MUC1 has not been obtained yet. It complicates significantly the rational design of potential drugs on the basis of these peptides. In this context, the aim of the present study was in silico prediction of
three-dimensional structures of MUC1-CD peptides corresponding to protein fragments (1-20 and 46-72), as well as analysis of their conformational properties. The main focus of the work was given to the peptide MUC1-CD (46-72), which is capable of binding to FA DD. Using the methods of molecular dynamics in the implicit water it was shown that the peptide MUC1-CD (46-72) can take conformations similar to the conformations of a number of fragments of the caspase-8 DED domain. It was found that  the structure of the peptide MUC1-CD (46-72) is similar to the spatial structure of at least four fragments of caspase-8. These results indicate that the molecular mechanism of the inhibitory activity of the peptide can be explained by competitive binding with FA DD due to the structural and conformational similarity with the fragments of the caspase-8 DED domain.

 
738-744 906
Abstract

Analysis of biological data is a key topic in bioinformatics, computational genomics, molecular modeling and systems biology. The methods covered in this article could reduce the cost of experiments for biological data. The problem of identifiability of mathematical models in physiology, pharmacokinetics and epidemiology is considered. The processes considered are modeled using nonlinear systems of ordinary differential equations. Math modeling of dynamic processes is based on the use of the mass conservation law. While addressing the problem of estimation of the parameters characterizing the process under the study, the question of nonuniqueness arises. When the input and output data are known, it is useful to perform an a priori analysis of the relevance of these data. The definition of identifiability of mathematical models is considered. Methods for analysis of identifiability of dynamic models are reviewed. In this review article, the following approaches are considered: the transfer function method applied to linear models (useful for analysis of pharmacokinetic data, since a large class of drugs is characterized by linear kinetics); the Taylor series expansion method applied to nonlinear models; a method based on differential algebra theory (the structure of this algorithm allows this to be run on a computer); a method based on graph theory (this method allows for analysis of the identifiability of the model as well as finding a proper reparametrization reducing the initial model to an identifiable one). The need to perform a priory identifiability analysis before estimating parameters characterizing any process is demonstrated with several examples. The examples of identifiability analysis of mathematical models in medical biology are presented.

 
745-752 1136
Abstract

Bacterial communities are tightly interconnected systems consisting of numerous species making it challenging to analyze their structure and relations. There are several experimental techniques providing heterogeneous data concerning various aspects of this object. A recent avalanche of metagenomic data challenges not only biostatisticians but also biomodelers, since these data are essential to improve the modeling quality while simulation methods are useful to understand the evolution of microbial communities and their function in the ecosystem. An outlook on the existing modeling and simulation approaches based on different types of experimental data in the field of microbial ecology and environmental microbiology is presented. A number of approaches focusing on a description of such microbial community aspects as its trophic structure, metabolic and population dynamics, genetic diversity as well as spatial heterogeneity and expansion dynamics is considered. We also propose a classification of the existing software designed for simulation of microbial communities. It is shown that although the trend for using multiscale/hybrid models prevails, the integration between models concerning different levels of biological organization of communities still remains a problem to be solved. The multiaspect nature of integration approaches used to model microbial communities is based on the need to take into account heterogeneous data obtained from various sources by applying high-throughput genome investigation methods.



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2500-3259 (Online)