INSECT GENETICS
Articles
The research concerns the task of identification of contrasted single nucleotide polymorphisms (SNPs) obtained in genome-wide pooled allelotyping of 16 human cohorts, comprising healthy and ill persons, by the nested case–control approach. The genotyping platform was the Illumina Omni1S chip with 1.2 million markers. The mean pooled sample size was about 200 individuals. The candidate selection was based on statistical comparison of allele frequencies in a “case–control” study. Samples of ill patients show significant deviations from healthy persons in the numbers of significantly differing polymorphisms. The variance of allele frequencies among repeats in a single cohort was less than that in random choice of pairs from different cohorts.
Parameters of lipid metabolism in the spectrum of blood serum are the most commonly used indicators in clinical practice. Their disturbances (dyslipidemia) are: elevated levels of total cholesterol and triglycerides, as well as changes of other parameters resulting from aberrations in lipoprotein synthesis, transport and cleavage. The clinical significance of metabolic disorders covered by the term dyslipidemia is associated primarily with high risks of cardiovascular diseases, diabetes mellitus type 2, and obesity. Associations of certain SNPs (G-2548A in the promoter region of the leptin gene (LEP), A223G in exon 4 of the leptin receptor gene (LEPR), T495G in intron 8 in the lipoprotein lipase gene (LPL), and C34G in exon 8 of a nuclear receptor (PPARG)) with disturbed lipid metabolism have been investigated, and their cumulative contribution to the development of dyslipidemia is demonstrated.
The goals of this study were to create a compilation of genes controlling human body weight and feeding behavior and to summarize functional and genomic information on these genes. Information on 424 human genes was obtained from scientific publications, OMIM and meta-analysis of GWAS data. Four genes (BDNF, MC4R, PCSK1, and POMC) were confirmed by all three data sources; thus, these genes have the highest priority (No. 1). Genes of other two groups (3 and 29 genes) were confirmed by two of three data sources; thus having priority No. 2. Pathways important for body mass regulation were revealed, and they may be candidate pharmacological targets for obesity treatment. Regions of human chromosomes containing closely located genes from the compilation were revealed. Some groups of closely located genes included genes (ETV5, MIR148A, NFE2L3, and TMEM160) confirmed by GWAS meta-analysis only. This finding may be helpful in the identification of their functions. Use of Residual Variation Intolerance Score (RVIS) revealed genes with decreased tolerance to functional genetic variation: LRP1, LRP5, RAI1, FASN, LYST, RPTOR, DGKD, LRP1B, NCOA1, and ADCY3. The compilation can be used in genotyping for pathology risk estimation and for designing new pharmacological approaches for treatment of human obesity.
The development of in vitro methods produced new experimental information on protein binding to DNA, which is accumulated in databases and used in studies of mechanisms regulating gene expression and in the development of computer-assisted methods of binding site recognition in pro- and eukaryotic genomes. However, it is still questionable to what extent sequences selected in vitro reflect the actual structures of natural transcription factor (TF) binding sites. The Kullback – Leibler divergence was applied to the comparison of frequency matrices of TF binding sites constructed on samples of artificially selected sequences and natural sites. Core sequences of natural and artificial sites showed high similarity for 80 % of all TFs studied. For 20 % of TFs, binding site sequences selected in vitro had a broader range of permissible significant nucleotides not found in natural sites. The optimum lengths of DNA sequences including natural binding sites, at which they are recognized most accurately, were estimated by the weight matrix method. For approximately 80 % of the TFs studied, the optimum binding site length notably exceeded the lengths of the core sequences, as well as the lengths of in vitro selected sites. The detected features of in vitro selected TF binding sites impose constraints on their use in the development of computer-assisted methods of the recognition of candidate sites in genomic sequences.
The plant hormone ethylene regulates both developmental processes and various stress responses in plants. Ethylene perception in plants is followed by activation of some transcription factors from the large family of APETALA2/ETHYLENE response factors (ERFs). ERF TF binding sites contain a specific GCCGCC motif, called GCC-box. In this study, we applied TF binding site recognition tools oPWM and SiteGA for sequence analysis of experimentally proven GCC-boxes. We carried out GCC box recognition and tested its distribution in the Arabidopsis thaliana L. genome. Functional annotation and microarray data analysis of the genes possessing predicted GCC-boxes elucidated their role in ethylene response.
In tasks of modern biology, the numbers of attributes often exceed the numbers of objects by orders of magnitude. For the solution of such tasks, a Data Mining method based on using a new measure of similarity between objects in the form of the Function of Rival Similarity (FRiS) is offered. On this basis, methods of quantitative estimation of compactness of patterns, construction of decision rules, and feature selection are developed. All these techniques are implemented in the FRiS-GRAD algorithm. The high efficiency of the algorithm is illustrated by results of solving the task of disease recognition on a microarray dataset.
Expression efficiency is one of major characteristics of genes considered in a number of modern investigations. It is known that gene expression efficiency in an organism is regulated at many stages: transcription, translation, posttranslational protein modification, and others. In this study, a special EloE (Elongation Efficiency) web application is described. It sorts genes in an organism in the order of decreasing theoretical rate of the elongation stage of translation deduced from their nucleotide sequences. The predictions done in this way show a significant correlation with available experimental data on gene expression in various organisms, for instance, S. cerevisiae and H. pylori. In addition, the program identifies preferential codons in a genome and defines the distribution of stability of potential secondary structures in 5′ and 3′ regions of mRNA. EloE can be useful in preliminary estimation of translation elongation efficiency of genes in organisms for which experimental data are not available yet. Some results can be used, for instance, in other programs modeling artificial genetic constructs in gene engineering experiments.
The transcriptional activity of genes was studied in kidneys of hypertensive ISIAH and normotensive WAG rats in order to detect genes significantly expressed in kidneys of just one of the analyzed strains. Gene profiling was performed on the Illumina RatRef-12 Expression BeadChip microarray platform. The expression of three genes (Klk1, Klk1c10, and Kng1) related to the kallikrein-kinin system was significant in the WAG renal cortex but was not detected in hypertensive kidneys. The downregulation of these three genes and Gucy1a3 in ISIAH renal cortex suggests the weakened function of the kallikrein-kinin system in hypertensive kidneys, which may cause blood circulation disturbances in renal glomeruli and mediate the development of hypertension in ISIAH rats. The functional annotation of the genes significantly expressed in renal medulla of just one of the compared rat strains revealed the genes involved in immune response regulation.
Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease, which affects motor neurons in the brain and spinal cord and leads to patients’ death. One of the causes of motor neuron degeneration and death is the formation of intracellular protein aggregates formed by a mutant SOD1 protein. Recently, it has been shown that the survival time of ALS patients with specific mutation in SOD1 gene inversely correlates with thermodynamic stability of the SOD1 mutant protein. In the present paper, we hypothesize that mutant SOD1 aggregation can be facilitated by not only destabilization due to hydrogen bonds disruption but also by formation of new hydrogen bonds, which can stabilize intermediate “pathogenic” conformations of the mutant SOD1 protein. Molecular dynamics simulations were conducted to estimate frequencies of hydrogen bond occurrence in the protein structure. It was shown that the regression model based on frequencies of hydrogen bond occurrence significantly better correlated with patients’ survival time (R = 0.89, p < 0.00001) than the estimation based on thermodynamic stability analysis of mutant SOD1 proteins.
This paper presents the results of the reconstruction and analysis of gene regulatory network of the circadian clock in mammals. Application of graph theory methods makes it possible to analyze the structure of the gene network and identify the central component of circadian clock regulation, which includes the basic regulatory circuits passing through the key element of the circadian clock, the Clock/Bmal1 protein. Cluster analysis has revealed subsystems with clear biological interpretation, which are involved in the functioning of the circadian clock by interacting with the central component. This structural model, which includes the central component and functional subsystems that interact with the central component, can provide grounds for the construction of a mathematical model of the dynamics of the gene network regulating the circadian rhythm.
The interaction diversity within the communities of living matter, from bacterial colonies to human societies, makes them inherently more complex than ensembles of particles in inanimate nature. Co-authorship networks are a particular case of intra- and inter-group social interactions. In this paper, we analyze the Novosibirsk biomedical scientific community as an example of such a network. Using the PubMed database, we have built a community network and calculated its statistics. The distribution of organizations by scientific activity has a fat tail and obeys the Pareto principle: 83% of publications and 75% of authors belong to the 20% of the most active organizations. A comparison of their networks shows that networks of the universities have a more pronounced core rather than those of research institutions. We have plotted the “demographic” structure of currently active authors and found out two facts: (1) an abundance of authors with short “publication experience” and (2) a deficit of authors whose first publication is dated back to 1991-1997. In general, the network dynamics is non-steady, and the activity tends to increase.
In this work, a mathematical model and its implementation are proposed for computational simulation of one-dimensional symplastic growth of tissues. We modified the formal grammar of differential L-systems, and in this grammar, we described a dynamic model of symplastic growth with regard to its biomechanics. The results of the simulation of linear leaf blade growth are compared with those for a free-growing cell population. It is shown that in the model proposed symplastic growth causes a greater deviation of the actual cell length from its isosmotic length than in freely growing cells.
Plants differ in the types of the root central cylinder: diarch, triarch, tetrarch, pentarch, or polyarch. The type of the symmetry is the reflection of the relative positions of xylem and phloem bundles in a cross section of the root. The mechanisms forming different types of symmetries in the central cylinder remain poorly understood. It is assumed that vasculature differentiation is triggered and controlled by plant hormone auxin (Sachs, 1969). We have developed a model that describes auxin flow through a cell layer, imitating a cross section of the vascular cylinder in a root. We have studied the stationary distributions of auxin in the cell layer depending on the model parameters. It is shown that the nonlinear processes of auxin transport regulation are responsible for the formation of asymmetric auxin distributions, which may be interpreted as the positional information for development of the diarch structure of the vascular cylinder. However, these distributions always coexist with uniform stationary distributions, not providing positional information. It is hypothesized that the most likely factor in the formation of the final auxin distribution in a root section is an appropriate geometry of the auxin flow from the shoot to the root.
The apical meristem located at the root tip of a plant is one of the most convenient objects to study the organization of the stem cell niche. In the root apical meristem, mitotically inactive cells of the quiescent center coexist with intensely dividing cells, which lose this ability at a certain distance from the quiescent center. It is known that plant hormones auxin and cytokinin play an important role in the regulation of this structure formation, but the mechanisms maintaining the dynamics of this structure remain unknown. We propose a mathematical model that summarizes experimental data on the distribution of auxin and cytokinin along the root longitudinal axis and their role in cell cycle regulation.
Mass spectrometry is a physical method, which can be applied to the investigation of proteomes of different organisms. It allows us both to solve the problem of identification of biological macromolecules and to sequence peptide chains in cases where information on the genomes is scarce or absent. Currently, there are many software programs to support research in this area. Nevertheless, in spite of all efforts, there is little progress in the development of programs able to solve the problem for de novo sequencing of cyclic peptides, which are most effective antibiotics, antitumor agents, immunosuppressants, toxins, and a vast number of nonribosomal peptides with unknown functions. In this paper, an effective algorithm for solving the problem of de novo sequencing cyclic peptides is proposed. The algorithm allows us to reconstruct sequences of lengths up to 160 amino acid residues.
We analyzed the possibility of using commercially available enzymes with cellulosolytic activity for saccharification of miscanthus biomass, Soranovsky variety, a new crop registered in Russia in 2013, in comparison to the saccharification of biomasses of Phalaris arundinacea, Thrachomitum lancifolium, and Sida hermaphrodita. For enzymatic hydrolysis, we used commercially available fungal cellulases: Thermomyces lanuginosus xylanase, Aspergillus niger cellulase, and Pen. verruculosum cellobiase and cellulase. A biomass was ground and incubated in alkaline peroxide. The highest rate of hydrolysis was observed with the Phalaris arundinacea biomass. We tested various combinations of enzymes and achieved 100 % conversion for all samples relative to the weight of hydrolyzable components, which corresponds to 70 % conversion of biomass.
Saccharomyces cerevisiae is the most appropriate and the most widely used model organism for industrial production of ethanol from sugars, because yeasts (1) have high rates of growth, fermentation and biosynthesis of ethanol under anaerobic conditions and (2) are tolerant of high concentrations of ethanol and low pH values. Currently, the most promising source of sugar is lignocellulosic biomass. Sugars derived from it are a mixture of hexoses and pentoses. However, S. cerevisiae strains in current use are poorly adapted to pentasaccharide fermentation. Therefore, it is necessary to optimize the metabolism of currently available bioethanol producers for pentasaccharide consumption. The article presents an overview of existing approaches designed to solve this problem by using recombinant S. cerevisiae strains.
The paper deals with the theoretical issues of biological oxidation of oil hydrocarbons from alkanes to polycyclic aromatics. We analyze the mechanisms of biochemical processes of decomposition of oil components and provide an overview of data from common databases. Studies of microbial communities of natural oil seeps in the Uzon caldera are described in detail. It is the first study of ecophysiological characteristics of oil-degrading microorganisms isolated from thermal oil seeps of the caldera.
Modern trends in using DNA in nano- and biotechnologies generated the need for new methods of analyzing DNA molecules with up-to-date equipment. We developed a method of mild nondestructive ablation with terahertz radiation for bringing DNA molecules to aerosol. DNA nanoparticles were measured in the gas phase with a diffusion aerosol spectrometer. Changes that happen to DNA in the gas phase were visualized by atomic force microscopy (AFM). Comparison of diffusion sizes of plasmid pUC18 aerosol particles with those obtained by AFM indicated that DNA molecules experienced condensation in the gas phase. We constructed a model on the base of modern concepts of DNA condensation and globule formation. The predictions matched well the experimental data. The persistence DNA length estimated in the gas phase was about 0.5 nm. This fact points to the absence of distributed charge on the DNA surface in the gas phase and the nonionizing habit of terahertz radiation. Study of DNA conformations in the gas phase will add to the understanding to DNA compactness under natural and artificial conditions.
Elymus L. is a genus of the Poaceae family, which includes only polyploid species. It is widespread over all continents, with at least half of the species occurring in Asia, and this continent is considered to be its motherland. However, the diversity, genetic characteristics, and evolutionary interactions among Elymus species of some regions of Asia are still vague, and the Far East of Russia is one of such territories. Thus, investigation of evolutionary relations among species of Far East and Kamchatka is promising. In this work, several sequences of two nuclear genes and rDNA Internal Transcribed Spacers annotated in databases are analyzed. Nuclear genes sequences are shown to be more useful in building phylogeny at the interspecies level. Also, a region of the nuclear gene waxy is shown to vary among different haplomes. This variation makes it useful in investigating the genome constitutions of novel Elymus species. Finally, systematical status of E. kamczadalorum as a species was proven valid.
The tryptophan biosynthesis pathway (TBP) is ubiquitous in most known organisms, being absent only from animals and some bacteria. It is conserved in plants, although various species differ in the number of TBP enzyme paralogs. In the current work we investigated a putative possible role of changes in the number of paralogs of TBP enzymes in the course of plant evolution. We identified TBP enzyme paralogs in plant species with fully sequenced genomes and estimated the relationship between its number and organismal complexity. It is shown that organismal complexity significantly correlates with the total number of TBP paralogs and for some enzymes specifically (ASA/ASB, PAI, and IGPS). We suggest that such a relationship arises because both organismal complexity and the increasing number of paralogs may be important for the evolutionary adaptation of land plants to variable environmental conditions.
Three high-performance versions of the Haploid Evolutionary Constructor program are presented (http://evol-constructor.bionet.nsc.ru). The software was designed for simulating the functioning and evolution of microbial communities. These high-performance versions are to be run on systems with shared and distributed memory, using CPU and/or GPU. Almost linear acceleration has been achieved on clusters and multi-core CPU. On GPU systems, the simulation time was reduced to several minutes (dozens of hours on CPU).