Genome constitution and differentiation of subgenomes in Siberian and Far Eastern endemic species of the genus Elymus (Poaceae) according to the sequencing of the nuclear gene waxy cибирских и дальневосточных видов по данным секвенирования

Fifty-three species of perennial grasses in the genus Elymus L. (Poaceae), which are widespread in Russia, are gene-rally assumed to have three haplome combinations: StH, StY and StHY. The StH-genome species, endemic to Russia, remain the least studied. R. Mason-Gamer and co-authors have previously shown in a series of studies that a molecular phylogenetic analysis of the low-copy gene waxy ( GBSS1 ) sequences significantly complements cytogenetic data on the genomic constitution and evolutionary relationships among both North American and Asian species of the genus Elymus . To determine the species’ genomic constitution and to evaluate the level of phylogenetic differentiation, we examined the GBSS1 gene in 18 species of Elymus from Siberia and the Russian Far East, including the fol lowing 14 endemics: E. charkeviczii , E. jacutensis , E. kamczadalorum , E. komarovii , E. kronokensis , E. lenensis , E. macrourus , E. margaritae , E. subfibrosus , E. sajanensis , E. transbaicalensis , E. peschkovae , E. uralensis , and E. viridiglumis . PCR amplification products of GBSS1 gene fragments (including exons 9–14) were cloned and 6–8 clones per accession were sequenced. It appears that all the species studied have St and H subgenomic gene variations. The most significant differences between the subgenomic variants St and H were found in intron 13. The H subgenome contains a 21-bp-long deletion in intron 13 in all Elymus genotypes, probably derived from a common ancestor of the H and P genomes. Instead of this deletion, all St subgenomes have a relatively conservative sequence similar to that of the genus Pseudoroegneria , whose ancestor is considered to be the donor of the modern St subgenome for all Elymus species. Cluster phylogenetic analysis revealed differentiation in St and H subgenome sequences into two evolutionary variants: St 1 vs. St 2 and H 1 vs. H 2 , respectively. Variants of the St and H subgenomes were found homo logous to various modern species of the ancestral genera Pseudoroegneria and Hordeum : St 1 to P. strigosa , St 2 to P. spicata , H 1 to H. jubatum , and H 2 to H. californicum . The details of the relationships between Russian and North American species of the genus, as well as a number of microevolutionary interconnections in the group of boreal endemic species of Siberia and the Russian Far East were revealed. The new results obtained here are essential for the deve-lopment of a phylogenetically oriented taxonomic system for the genus Elymus .


Introduction
The genus Elymus L. (wildrye) is the largest genus of the tribe Triticeae Dumort in the family Poaceae Barn. It contains exclusively amphiploid self-pollinating perennial grasses (Dewey, 1984;Löve, 1984). Species of the genus are widespread on all continents from the Holarctic to the subtropics, with more than half populations growing in Central Asia (Lu, 1994). The genomic constitution of all species formed by haplomes from the ancestors of several modern genera: Pseudoroegneria (Nevski) Á. Löve (St haplome), Hordeum L. (H haplome), Agropyron Gaertn. (P haplome) and Australopyrum (Tzvelev) Á. Löve (W haplome), as well as the Y haplome from an unknown ancestor. The St haplome is common for all species of the genus. After institution and recognition of the genomic classification system for the tribe Triticeae (Dewey, 1984), a taxonomic system began to spread, in which the genus Elymus in the broadest sense is divided into several separate genera on the basis of the genomic composition of the species (Baum et al., 2011) Baum (StStPP genome). According to the current concepts, the genus Elymus within Russia is divided into four sections: Turczaninovia (Nevski) Tzvelev (includes 4 species), Goulardia (Husn.) Tzvelev (includes 42 species), Elymus (includes 6 species), and Clinelymopsis (Nevski) Tzvelev (includes 1 species) (Tsvelev, Probatova, 2010). This system was built according to the traditional criteria (comparative-morphological and ecological-geographical) and ensures the integrity and consistency of the genus, but the same sections include species with different genomic constitutions.
Nowadays it becomes evident that a balanced integrated approach is required to construct a phylogenetically oriented system of taxa of the genus Elymus. The main difficulty here is in combining two entirely different methodologies in botany: traditional taxonomy with the priority of morphological criteria and molecular genomics based on the modern molecular technologies. Significant results on the use of molecular markers were obtained by R. Mason-Gamer with collaborators (Helfgott, Mason-Gamer, 2004;Mason-Gamer, 2001, 2004, 2008Mason-Gamer et al., 1998;2010a, b). In particular, their studies have shown that comparative data on nucleotide sequences of the low-copy gene waxy (granule-bound starch synthase 1, GBSS1) are consistent with cytogenetic data in regard to the genomic constitution and evolutionary origin of North American (Mason-Gamer, 2001) and Asian (Mason-Gamer et al., 2010a) species of the genus Elymus.
We have analyzed the applicability of the nuclear low-copy genes bmy2 and waxy, as well as ITS rRNA clusters as genetic markers for the assessment of phylogenetic relationships between species of the genus growing in Siberia and in the Russian Far East (Shmakov et al., 2015). We confirmed that comparative analysis of selected locus sequences in combination with other molecular markers allow phylogenetic relationships between taxa to be reconstructed. Moreover, our studies proved that data on the genomic constitution of species and their microevolutionary relationships are to be taken as a fundamental basis to construct phylogenetically-oriented genus systematics for the species grown in Russia. The availability of numerous GBSS1 gene sequences in the NCBI Nucleotide database (http://www.ncbi.nlm.nih.gov/nuccore) enables a more detailed assessment of the relationships between a large number of genotypes of each species in comparative studies.
Here we present a comparative analysis of nucleotides sequences of an ~1300-bp-long fragment of the GBSS1 gene including exons 9 to 14 in 18 Elymus species (including 14 endemics) growing in Siberia and in the Russian Far East in order to establish or confirm their genomic constitution and to assess the evolutionary differentiation levels of subgenomes in different species. This information is essential for the construction of a phylogenetically oriented taxonomic system of the genus species growing in Russia.

Materials and methods
The analyzed accessions included genus Elymus species widespread in the Asian part of Russia, mainly with unknown or unconfirmed genomic constitutions (Supplementary Material 1) 1 . Known GBSS1 gene sequences of the St, H and Y genomes deposited in the NCBI database were used as references for comparative analysis (Supplementary Material 2). Genomic DNA was extracted from fresh or dried leaves as previously described by Khanuja et al. (1999) with modifications, or by using Nucleospin Plant II kits (Macherey-Nagel, Germany) according to the manufacturer's recommendations.
The previously described (Mason-Gamer et al., 1998) primers F-for (TGCGAGCTCGACAACATCATGCG) and M-bac (GGCGAGCGGCGCGATCCCTCGC) were used for PCR-amplification of an ~1300-bp-long GBSS1 fragment overlapping gene exons from 9 to 14. The PCR reaction mixture of a total volume of 15 µl contained Taq buffer, 0.2 mM of each dNTP, 1.5 mM MgCl 2 , 1 µM of each primer, 20 ng genomic DNA, and 1.0 unit of HS Taq DNA polymerase (Eurogen, RF). The following temperature profile was used (a C-1000 thermal cycler, Bio-Rad, USA): primary denaturation for 4 min at 94 °C, then 38 three-step cycles with denaturation for 25 sec at 94 °C, annealing for 30 sec at 65 °C and elongation for 1 min at 72 °C, followed by final elongation for 20 min at 72 °C to enhance the terminal non-matrix addition of deoxyadenosine at the 3ʹ-end of the PCR product (Mason-Gamer et al., 1998). Amplification products were analyzed by 1.7 % agarose gel electrophoresis in TAE buffer at an electric field strength of 4 V/cm.
Since allopolyploid Elymus genomes contain at least two subgenomic variations of the GBSS1 gene, amplification was performed in three replicates to minimize the "PCR drift" effect due to stochastic fluctuations at the initial stages of PCR (Wagner et al., 1994). The combined PCR product was ligated into vector pAL2-T (Eurogen, RF) and then cloned in chemically competent XL1-Blue E. coli cells. Positive colonies containing recombinant plasmids with a GBSS1 insert were selected by blue/white coloring of E. coli grown on Amp(+) LB-Agar containing X-gal/IPTG. Twenty white colonies for each accession were tested for an insert of the expected length by PCR-amplification with universal M13 primers (Eurogen, RF) followed by electrophoresis analysis. At least 6 colonies containing pALT2 with an insert of the expected size (~1300 bp) per each accession have been grown in 4 ml LB liquid medium for 16 hours at 37 °C and 220 rpm. Plasmid DNA was isolated with a Plasmid Miniprep Kit (Eurogen, RF) according to the manufacturer's instructions.
The Sanger sequencing reaction in a total volume of 40 µl contained 0.7 µg of plasmid DNA with a total length of ~4300 bp, 20 pmol of primer M13F or M13R, 1.8 µl of reagent BigDye v. 3.1 (ABI, USA), 7.2 µl of 5X sequencing buffer (ABI, USA) and water up to the final volume. The temperature profile for the Sanger reaction included primary denaturation for 2 min at 95 °C, then 50 three-step cycles with denaturation for 30 sec at 95 °C, then annealing for 10 sec at 55 °C and elongation for 4 min at 60 °C. Sanger reaction products were purified from excess of BigDye components by gel filtration through micro columns containing 600-700 µl of prepared Sephadex G-50 (GE Healthcare) with liquid removal from the dead volume by centrifugation for 2 min at 900 g and subsequently analyzed on an ABI 3130XL 1 Supplementary Materials 1 and 2 are available in the online version of the paper: https://vavilov.elpub.ru/jour/manager/files/SupplAgafonov_engl.pdf automatic gene analyzer (ABI, USA) at the Genomics Core Facility SB RAS. DNA sequences obtained were assembled into contigs overlapping GBSS1 from exon 9 to 14, including 5 introns, by using Unipro UGENE v1.31.0 (Okonechnikov et al., 2012). Finally, at least 6 clones of GBSS1 per each of 22 Elymus accessions have been sequenced. Additionally, 42 nucleotide sequences from the NCBI GenBank were used for comparative analysis.
Multiple sequence alignment was performed using the T-Coffee program (www.tcoffee.org) and refined manually. The alignments of the GBSS1 fragment were used to generate phylogenetic trees using the maximum likelihood (ML) method on the IQ-TREE web server (Trifinopoulos et al., 2016). For each exon and intron, the best models of nucleotide substitutions were determined in PartitionFinder version 2.1.1 (Lanfear et al., 2016) with the following parameters: the AICc selection model, "greedy" search algorithm and related (linked) lengths of the branches (Lanfear et al., 2012). The previously proposed (Mason-Gamer, 2004) sequence of Bromus tectorum AY362757.1 from the NCBI GenBank was used to root the dendrograms. The statistical support of topology in IQ-TREE analysis was evaluated using 1000 replications produced by SH-aLRT (Guindon et al., 2010) and UFBoot (Minh et al., 2013) methods.

Results and discussion
The results obtained using reference accessions carriers of the genus Elymus ancestor genomes, St (genus Pseudoroegneria species) and H (genus Hordeum species), clearly indicated the presence of only the St and H genomes in all the studied species from Siberia and the Russian Far East, thus confirming that these species belong to the tetraploid StH genome group. Obviously, the center of species diversity for the StH genome group is shifted to the north from the center of origin of most StY genome species located in China (Lu, Salomon, 1992). Interestingly, the allotetraploid group of Elymus species of North America is also represented mainly by StH genome species (Mason-Gamer, 2001). Only rare individuals of several alien Asian StHY and StY genome species were reported there (Barkworth et al., 2007).
The most notable differences between the St and H subgenomic fragments are located in intron 13 (Fig. 1). The H subgenome sequences of this intron in all Elymus genotypes analyzed contained a 21-bp-long deletion, most likely coming from a common ancestor of the H and P subgenomes, since it is also present in modern representatives of related monogenomic species from the genera Hordeum and Agropyron. However, all St and Y subgenomes had at the very site of this deletion a relatively conservative sequence, which largely matches a sequence in the genus Pseudoroegneria, whose ancestor is believed to be the donor of the modern St genome. Small deletions are also common for other regions of this intron, but are less frequent in the other GBSS1 fragment regions analyzed.
In addition, our analysis of the accessions did not confirm the previously published data on the existence of conservative sites that are absolutely specific to the H and St haplomes (Shmakov et al., 2015). This was true only partially of some sequences belonging to different haplomes.
Cluster analysis of the whole GBSS1 region from 9 to 14 exon sequences, as well as separate sequences of introns or Genome constitution and differentiation of subgenomes in Siberian and Far Eastern Elymus species exons, showed common patterns with certain nuances of phyletic connections both within and between related groups of the Elymus taxa analyzed. The analysis of the most conservative sites (exons 9-14) showed uniformity within the same subgenomes and at the same time distinction among different subgenomes (Fig. 2).
In the species studied the two subgenomes were found clearly differentiated. For instance, the sequences of the St subgenome were divided into two groups: St 1 and St 2 . The sequences of the St 1 subgenome for Siberian species are probably older since they were found not only in the northern biotypes E. macrourus, E. jacutensis, E. kamczadalorum and more southern StY genome species E. gmelinii and E. pen dulinus, but also in P. strigosa accession PI 499637 from the northeastern part of China.
The subgenomic group St 2 was formed by a larger part of the species, including both strictly local (E. komarovii, E. uralensis, E. sajanensis, E. margaritae) and widely vicarious (E. caninus, E. sibiricus) species. This fact is clearly illustrated by nucleotide sequence peculiarities in different regions of the gene, as shown in Fig. 3. Remarkably, accession AUK-0650 of the Altai species E. margaritae contained both variants of the St subgenome. At the same time, in the set of 8 sequences for each E. komarovii accession GAR-0501 and E. margaritae accession GUK-1709 the sequences belonging to the St subgenome were not detected. + a t a a T t a t c t c t g g t t t a g a -a t g c a g T T c c C a g a a C A a c a a g g a a g a g c t g c t t g t g t t c g a t g c a t c c a T t A A Sequences of a greater number of the Elymus species from North American natural accessions initially were subdivided according to the same principle (Mason-Gamer, 2001), therefore we have built a dendrogram that included the endemic species of Asian Russia in comparison with some sequences of North American species. The sequences of the St subgenome including exons 9 to 14 with introns were used. The results are shown in Fig. 4.
In this version of the dendrogram, Asian species were also distributed among two clades with the same composition as in Genome constitution and differentiation of subgenomes in Siberian and Far Eastern Elymus species the dendrogram constructed using exons alone. Some of the North American species (marked by dots on the dendrogram) together with the Asian species P. strigosa formed a joint clade with the group of the St 1 subgenome, while the others met in the St 2 group along with all accessions of the North American species P. spicata. GBSS1 sequences of the Y subgenome in E. gmelinii and E. pendulinus showed a closer relationship with the St 2 group, which does not contradict the data on the evolutionary origin of this subgenome (Mason-Gamer et al., 2010a).
The H subgenome showed a similar pattern of differentiation. Figure 5 shows a dendrogram constructed from complete sequences of the H genome introns and exons from Russian and North American species (the latter are marked with dots in the figure). Two perennial Hordeum species (marked with asterisks) were taken as references. Gene copies from the H genome appeared to be divided into two main clades (designated as H 1 and H 2 ). Clade H 1 included exclusively Russian species, while clade H 2 was formed by Russian northeastern and all North American species. Each of these clades has its own ancestral taxon from the contemporary genus Hordeum:  685  690  695  700  705  725  880  990  1250  1280   mut_5279_H and South Ural ABZ06_2). The third reference, mut_9330_H E. mutabilis, from Altai fell into clade H 1 . Thereby, only some tendency toward relations between the North American accessions and northern or eastern accessions of Russian species can be derived from the H subgenome sequence analysis. The close relationship between American and Kamchatka species is easier to understand, taken into ac-count the historical connections of these flora with each other, as well as with the species from the wide northern distribution areas of E. macrourus and E. jacutensis. It is more difficult to explain the close proximity of Chinese and South Ural accessions of E. mutabilis to this group.

g c c t c c T c C t t c a g t c c t t c t t g T c G t T c C T G G t G G g c C T C T T g t a t t
Nevertheless, GBSS1 gene variability provides a tool to trace evolutionary relations of species and local geographical Genome constitution and differentiation of subgenomes in Siberian and Far Eastern Elymus species races from Siberia and the Russian Far East. If we consider the relative position of the accessions inside the clades of the subgenomes, we will see that the clusters combined the species accessions according to their perceived relationship. E. jacu tensis and E. macrourus species, for instance, united into the common clusters in both H and St clades (see Fig. 2), as well as on separate dendrograms of these subgenomes (see Fig. 4, 5), thereby confirming the earlier assumptions about E. jacutensis being an aristate subspecies of E. macrourus (Tsvelyov, 1964). This fact is consistent with data on comparative morphological and peptide electrophoretic analyses and hybridization of these species' particular biotypes .
A comparative sequence analysis confirmed the isolation of E. kamczadalorum from the Kamchatka species E. charkeviczii, which was previously established using data on comparative morphology, electrophoresis of seed endosperm storage proteins, sexual hybridization (Agafonov, Gerus, 2008) and molecular ISSR analysis (Kobozeva et al., 2017). The species E. komarovii and E. transbaicalensis formed indistinguishable branches inside clade H 1 together with the Altai species E. margaritae, while E. transbaicalensis and E. margaritae clones were grouped inside clade St 1 . The phylogenetic proximity of the first two species has been repeatedly experimentally confirmed previously (Agafonov et al., 2019), while the degree of E. margaritae isolation is currently being studied using biosystematic methods. The most unexpected data were obtained regarding the relationships in the group of South Ural biotypes of E. uralensis, E. viridiglumis, E. caninus, and E. mutabilis. These data are currently being verified in the field and laboratory experiments.

Conclusion
Therefore, despite a complicated reticulate evolution in parallel with various related allopolyploid genera and constantly ongoing active microevolutionary transformations, basic genomes seem to have retained unique ancestral features. This makes it relatively easy to identify the genomic composition and to classify modern species within the framework of a phylogenetically oriented taxonomic model of the genus. In our opinion, the integrity of the genus ought to be preserved, because some species in the independent genus Roegneria with the genomic formula StY (Baum et al., 1991) are similar   in morphology to species in the newly proposed genus Cam peiostachys with the genomic formula StHY (Baum et al., 2011), the species of which are significantly different from each other in morphology. The St genome originating from the ancestors of the genus Pseudoroegneria seems to be an anchoring constant for a genetic unification of all members of the genus.
We suppose that differentiation of the genus should be based on a model of microevolutionary complexes representing an aggregate of taxa evolving through hybridization and introgression. The degree of taxa relationship within the complex should be confirmed using biosystematic methods with the obligatory determination of the ability to cross, i. e. taking into account the position in the system of recombination (RGP) and inrogressive (IHP) gene pools (Agafonov, Salomon, 2002). In fact, the microevolutionary complex is a projection of the RGP collection onto the taxonomic model of the genus, considering the genomic constitution of the species. Each microevolutionary complex should be thought of as a branched system of different ranks of taxa (species and subspecies), remaining therefore a phylogenetically confirmed structure.
In the future, it is necessary to determine the taxonomic rank of microevolutionary complexes, which can be sections or aggregates of the same species in a broad sense, as shown by the example of a revision of Pendulini (Nevski) Tzvelev sub-section of the Goulardia (Husn.) Tzvelev section (Kobozeva, .