Patterns of nucleotide diversity for different domains of centromeric histone H 3 ( CENH 3 ) gene in Secale

Rye (Secale) is among staple cereals along with other members of the Triticeae tribe: wheat and barley. The genus Secale includes perennial and annual, cross-pollinating and self-pollinating species, and they can be donors of valuable genes in wheat and rye breeding programs. Studies of the structure of the gene for centromeric histone H3 (CENH3), essential for centromere functions, are relevant to the breeding of agronomically important crops. We have investigated the nucleotide diversity of sequences of two variants of the rye CENH3 gene inside the N-terminal tail (NTT) and the conservative HFD (histone fold domain) domain in the genus Secale. The mean values of nucleotide diversity in the NTT and HFD of wild crossand self-pollinating taxa are close in αCENH3: πtot = 0.0176–0.0090 and 0.0136–0. 0052, respectively. In the case of βCENH3, the mean values for NTT (πtot = 0.0168–0.0062) are lower than for HFD (πtot = 0.0259–0.084). The estimates of nucleotide and haplotype diversity per site for the CENH3 domains are considerably lower in taxa with narrow geographic ranges: S. cereale subsp. dighoricum and S. strictum subsp. kuprijanovii. Commercial breeding reduces the nucleotide sequence variability in αCENH3 and βCENH3. Cultivated rye varieties have π values within 0.0122–0.0014. The nucleotide and haplotype diversity values in αCENH3 and βCENH3 are close in S. sylvestre, which is believed to be the oldest rye species. The results of this study prove that the frequency of single nucleotide polymorphisms and nucleotide diversity of sequences in genes for CENH3 in Secale species are influenced by numerous factors, including reproduction habits, the geographic isolation of taxa, breeding, and the evolutionary age of species.


Introduction
Rye (Secale) belongs to the tribe Triticeae along with other grain crops, such as wheat (Triticum spp.) and barley (Hor deum spp.)According to the taxonomy proposed by S. Fre deriksen and G. Petersen (1998) on the base of morphometrical analysis of rye species, the genus consists of three botanical species: the crosspollinating perennial species Secale stric tum Presl., crosspollinating annual species S. cereale L., and selfpollinating annual species S. sylvestre Host.The crosspollinating species S. cereale L. and S. strictum Presl.include selfpollinating subspecies S. vavilovii and S. africa num, respectively.
Cultivated and wild rye varieties can be donors of valuable traits, such as winter hardiness, high protein content, and disease resistance.They are used for improvement of exist ing rye and wheat cultivars and in interspecies crosses of rye and wheat (Tang et al., 2011).Interspecies hybridization is often accompanied by elimination of whole chromosomes or their parts.The incompatibility between centromeres and the centromere-specific histone H3 variant (CENH3) of parents may be one of the causes of chromosome elimination from interspecies hybrids (Sanei et al., 2011).Conversely, close similarity between CENH3 proteins of distant parents can secure the normal function of centromeres and formation of true hybrid plants bearing genomes of both parents (Ishii et al., 2015).The structure of CENH3 includes two domains: the variable N-terminal tail (NTT) and the more conserva tive Cterminal histone fold domain (HFD) (Roach et al., 2012).The latter interacts with centromeric DNA, whereas NTT is not required for localization on centromeric DNA but is essential for correct chromosome segregation in mitosis and meiosis (Maheshwari et al., 2015).Nevertheless, the segregating polymorphism of CENH3 genes in grass species is poorly investigated.The multitude of rye species, includ ing annual, perennial, selfpollinating, and crosspollinating forms, allows assessment of the action of various factors on the genetic variation of CENH3.
The objectives of this study were: (1) estimation of the nucleotide and haplotype diversity of CENH3 domains in three Secale species and (2) assessment of the influence of taxonomic and geographic factors and mating systems on nucleotide polymorphisms within the domains in two variants of the CENH3 gene.

Materials and methods
Plant material and RNA isolation.Experiments were done with ten wild and cultivated rye accessions of three Secale species.Seeds of Secale cereale subsp.cereale (cvs.Otello, Imperial), S. cereale subsp.vavilovii, S. cereale subsp.dig horicum, S. cereale subsp.afghanicum, S. strictum subsp.kuprijanovii, S. strictum subsp.strictum, S. strictum subsp.anatolicum, S. strictum subsp.africanum, and S. sylvestre were supplied by the Leibniz Institute of Plant Genetics and Crop Plant Research (Germany), the US Department of Agriculture (United States), and the N.I.Vavilov Research Institute of Plant Industry (Russia) from their germplasm collections.Total RNA isolation, synthesis of first-strand cDNA, and PCR were conducted as in (Evtushenko et al., 2017).Amplification primers specific to NTTs and HFDs of αCENH3 and βCENH3 genes from rye cDNA had been chosen in (Evtushenko et al., 2017).The amplification products were cloned and sequenced using BigDye Terminator Cycle Sequencing chemistry (v.3.1) on an ABI3100 Genetic Analyzer (Applied Biosystems, CA, USA).
Sequence analysis.Alignments of CENH3 coding se quences were performed using online Clustal Omega (Sievers et al., 2011) at http://www.ebi.ac.uk/Tools/msa/clustalo.The DnaSP version 5.10.01 (Librado, Rozas, 2009) was used to estimate the levels of nucleotide diversity for each domain individually in all subspecies with regard to different functions of CENH3 domains.The levels of genetic variation within CENH3 were estimated as nucleotide diversity π, haplotype diversity H d , and θ W , the last index being the relationship be tween segregating sites and alleles.The Watterson estimator θ W is based on the number of polymorphic sites in a sample of sequences drawn at random from a population (Watterson, 1975), whereas nucleotide diversity π represents the average sequence divergence of all homologous sequences among all individuals in a given set for comparison (Nei, Li, 1979).

Results
We sequenced the NTT and HFD domains of CENH3 from 10 to 25 samples per domain for each accession.Formerly, we had shown that the main forms of rye αCENH3 are βCENH3 were 501 and 456bp long, respectively, in all Secale species and subspecies (Evtushenko et al., 2017).The lengths of NTTs and HFDs in αCENH3 and βCENH3 analyzed in this study are shown in Table 1.
In αCENH3, 1 to 21 singlenucleotide polymorphisms (SNPs), or segregating sites, were found inside the domains (Table 2).In the perennial S. strictum Presl.subspecies, the number of SNPs in HFD was greater than in NTT, whereas in annual S. cereale L. subspecies NTTs had more segregat ing sites.Among perennial subspecies, the highest genetic variation in the CENH3 domains was detected in wild cross pollinating forms: S. strictum ssp.strictum, hereafter referred to as S. strictum and S. strictum ssp.anatolicum (hereafter S. ana tolicum).There, π ranged from 0.0131 to 0.0140 in NTT  2).The estimates of nucleotide diversity for the selfpollinating rye species S. sylvestre were somewhat lower, but the overall number of segregating sites was 25 for both domains.The θ W values in S. sylvestre were also higher than in some crosspollinating subspecies: 1.31 (NTT) and 2.03 (HFD), and the π values for NTT and HFD were close: 0.0090 and 0.0086.Thus, the nucleotide diversity of CENH3 is equally high in rye acces sions from wild cross and selfpollinating populations.Perennial S. strictum ssp.kuprijanovii (S. kuprijanovii) and annual S. cereale ssp.dighoricum (S. dighoricum) showed the lowest nucleotide and haplotype diversities of αCENH3 among the subspecies: 0.0065 to 0.0039 in NTT and 0.0019 to 0.005 in HFD.Accessions of these subspecies may have originated from a small geographic range, where they had higher inbreeding coefficients to limit geneflow within the taxa (Hagenblad et al., 2016).The comparison of nucleotide diversity values (π tot ) for αCENH3 domains of two rye cul tivars and wild rye subspecies showed that the estimates of diversity were low in cultivated rye, which might have resulted from specific breeding features.Haplotype diversity values (H d ) were uniform across all domains of rye αCENH3, rang ing from 0.972 (S. strictum) to 0.736 (S. cereale cv.Otello) in NTT and from 0.942 (S. anatolicum) to 0.727 (S. afghanicum) in HFD with the exception of low H d values in S. kuprijanovii and S. dighoricum.
The average estimates of nucleotide diversity π for βCENH3 were no lower than for αCENH3 (Table 3).High π values were observed for the wild crosspollinating subspecies S. stri ctum (NTT, 0.0137; HFD, 0.0120) and S. afghanicum (NTT, 0.0134; HFD, 0.0112).In the selfpollinating subspecies S. africa num and S. vavilovii, we also noted large numbers of segregating sites (21 and 15 in both CENH3 domains) and high nucleotide diversity (0.0168 and 0.0095 in NTT, 0.0295 and 0.0084 in Patterns of nucleotide diversity for different domains of centromeric histone H3 (CENH3) gene in Secale L.
HFD).In most accessions, the levels of nucleotide poly morphism were high in both βCENH3 domains.In the HFD domains of S. africanum and S. sylvestre, the indices S n (the number of segregating sites), θ W , π, and H d were higher than in NTTs.Of the two paralogous CENH3 genes of rye, βCENH3 appears to be the younger (Evtushenko et al., 2017), and this younger age may be responsible for the high genetic variation in HFD, the more conservative CENH3 region.Both αCENH3 and βCENH3 show higher nucleotide diversities at synony mous sites than at nonsynonymous except for the nucleotide diversity values in NTT and HFD of αCENH3 in S. strictum and two cases with π syn = 0.0000: αCENH3 of S. dighoricum and βCENH3 of S. cereale cv.Imperial.These estimates confirm the effect of purifying selection on rye CENH3 and the possibility of adaptive selection for individual codons, formerly demonstrated by E.V. Evtushenko et al. (2017).

Discussion
We compare nucleotide diversity patterns in domains of the coding sequence of the gene encoding centromeric histone H3 (CENH3), which is one of the epigenetic tags of an active centromere.Centromeres, to which microtubules are attached, define the proper cell segregation in mitosis and meiosis (Comai et al., 2017).Nucleotide diversity comparisons are performed within the genus Secale, whose accessions repre sent annual, perennial, crosspollinating, and selfpollinating forms; subspecies that experienced geographic isolation; and cultivated varieties.Levels of genetic diversity assessed as the numbers of segregating sites S n , the Watterson estimator θ W , nucleotide diversity π, and haplotype diversity H d in the sequences of the αCENH3 and βCENH3 domains are higher in wild crosspollinating subspecies, be they perennial or an nual.However, the crosspollinating reproductive habit is by no means the only factor increasing nucleotide diversity in rye CENH3.The same trend is observed in selfpollinating perennial S. africanum and in the annual subspecies S. vavilo vii and S. sylvestre.In contrast, significantly lower nucleotide diversities π and numbers of segregating sites S n are found in crosspollinating subspecies populating small geographic ranges: S. dighoricum and S. kuprijanovii.The lower CENH3 nucleotide diversity in rye cultivars in comparison to wild ac cessions may be related to the effects of different genotypes involved in breeding and the breeding process itself.The es timates of nucleotide diversity for rye CENH3 are higher than for the ScVrn1 gene, which controls vernalization sensitivity in rye (Li et al., 2011), but are close to estimates for HTR12 (CENH3 analog) in Arabidopsis lyrata and Arabidopsis lyrata ssp.petraea (Kawabe et al., 2006).

Conclusion
The nucleotide variation, or sequence diversity, in rye CENH3 is elevated by both crosspollination and selfpollination in large natural populations.Geographic isolation of crosspol linating rye forms, as well as breeding processes in case of cultivated rye reduce the nucleotide diversity of CENH3.Nowadays, operations with the CENH3 structure are in broad use in breeding programs for production of haploid lines in important crops, capture of heterosis (KarimiAshtiyani et al., 2015).Further advance in this field demands identification and knowledge of CENH3 features in commercially important

Table 1 .
Lengths of coding sequences (CDS) of CENH3 genes in Secale High nucleotide diversity levels were found in all the three selfpollinating rye accessions studied: S. cereale ssp.africa num (hereafter S. africanum), S. cereale ssp.vavilovii (here after S. vavilovii), and S. sylvestre Host., regardless of their belonging to annual or perennial Secale species.The overall number of segregating sites in both αCENH3 domains of self pollinating subspecies varied from 21 in in S. africanum to 27 in S. vavilovii, and the π values for these subspecies ranged from 0.0128 to 0.0176 in NTT (see Table организация хромосом / Chromosome organization and from 0.0116 to 0.0136 in HFD.Similarly, the estimates of nucleotide diversity for wild annual crosspollinating S. cereale ssp.afghanicum (hereafter S. afghanicum) in both αCENH3 domains were higher than in annual rye cultivars.

Table 3 .
Estimates of nucleotide diversity in the NTT and HFD of βCENH3

139 организация хромосом / Chromosome organization species
. Data on the nucleotide diversity of rye CENH3 may be useful in choosing parents for crosses between wild and cultivated species.