Plant diversity and transcriptional variability assessed by retrotransposon-based molecular markers

Молекулярные маркеры играют особую роль в генетике, по­ скольку они используются на разных уровнях исследований: при позиционном клонировании, которое включает опреде­ ление генов, контролирующих желаемые признаки, и при беккроссировании, а также в современном растениеводстве и судебной медицине. Основными компонентами всех эука­ риотических геномов являются ретротранспозоны, что делает их удобными для использования в качестве молекулярных маркеров. Они составляют основную часть крупных геномов растений; различия геномов по размерам объясняются разным количеством ретротранспозонов. Эти мобильные элементы встречаются по всему геному, и их репликативная транспози­ ция позволяет осуществлять вставку нуклеотидов в геном без делеции исходных элементов. Активность ретротранспозонов может проявляться в процессе развития, дифференцировки клеток и при стрессе, а также может быть источником неста­ бильности хроматина и геномных перестроек. Структура ре­ тротранспозонов в целом и домены, отвечающие за различные фазы их репликации, высоко консервативны у всех эукариот. Значительная часть ретроэлементов утратила способность перемещаться самостоятельно, либо по причине точечных мутаций и/или делеций многие из них обогащают дефектные элементы делециями. Благодаря повсеместной встречаемо­ сти этих генетических элементов и их способности стабильно интегрироваться в рассредоточенные хромосомные локусы с полиморфизмом в пределах видов, созданы различные системы молекулярных маркеров. Маркеры на основе ретро­ транспозонов с длинными концевыми повторами (LTR­ретро­ транспозонов) целесообразно использовать не только для проведения генетического анализа или картирования, но и для выделения и характеристики LTR­ретротранспозонов, таких как длинные концевые повторы или содержащиеся в них внутренние гены. В настоящем обзоре описаны маркерные системы, созданные на основе ретротранспозонов для иссле­ дования растений, и оценено место таких маркеров в генети­ ческом анализе разнообразия видов растений.

Large portion of many eukaryotic organisms' genome consists of interspersed repetitive sequences, transposable elements (TEs) in particular. Interspersed repeats in most of studies species weren't distributed uniformly, but rather unevenly, with some of them being clustered around telomeres or centromeres. Variation in copy number of repeat elements and internal rearrangements on both homologous chromosomes occur after the induction of recombinational processes during the meiotic prophase. The resulting heterogeneity in the arrangement of distinguishable repeats is used for certain molecular markers techniques by targeting mentioned repeat elements.
In eukaryotic genome retrotransposons are two major transposable elements, which are defined according to their mode of propagation. They fall under class I TEs and transpose via RNA intermediate, in contrast to other transposons of class II that don't involve RNA intermediate (Finnegan, 1990). Depending on their structure and transposition cycle, retrotransposons can be classified into two main subclasses: LTR retrotransposons and the non-LTR retrotransposons: long interspersed repetitive elements (LINE) and short interspersed nuclear elements (SINE), determined by the presence or absence of long terminal repeats (LTRs) at their ends. All groups are accompanied by their respective non-autonomous forms that lack one or more of the genes essential for transposition: MITEs (miniature inverted-repeat tandem elements) for class II, SINEs for non-LTR retrotransposons, and TRIMs (terminal-repeat retrotransposons in miniature) and LARDs (large retrotransposon derivatives) for LTR retrotransposons (Kalendar et al., 2004. Retrotransposons and retroviruses share common similarities, such as overall structural features and basic stages of the life cycle (Frankel, Young, 1998;Vicient et al., 2001;Mita, Boeke, 2016). However, unlike retroviruses, retrotransposons don't leave genome in order to infect new individuals, but insert the new copies only into their host genomes. If integration appears within a cell lineage from which pollen or egg cells are ultimately derived, new polymorphism is formed. These newly integrated copies are useful for discriminating breeding lines, varieties, or populations of plants from each other.

Retrotransposon-based marker systems
Retrotransposons are one the most fluid genomic components, fluctuating immensely in copy number over relatively short evolutionary timescale and represent a major component of the structural evolution of plant genomes (Flavell et al., 1992;Voytas et al., 1992;Macas et al., 2015). In plants, LTR retrotransposons tend to be more abundant than non-LTR (Macas et al., 2011). In many crop plants between 40-70 % of the total DNA is comprised of LTR retrotransposons (Pearce et al., 1996;Goke, Ng, 2016). Most of retrotransposons are nested, mixed, inverted or truncated in chromosomal sequences. Fragments of LTR with retrotransposons internal part are located near other retrotransposons, which allows the use LTR sequences for PCR amplification. Sites of genome with high density of retrotransposons can be used to detect their chance association with other retrotransposons. Event in which new genome integrations result from retrotransposon activity or recombination can be used for distinguishing reproductively isolation plant line. In this case, amplified bands derived from new insert or recombination will be polymorphic, appearing only in plant lines in which the insertions or recombination have taken place.
Different ways of using transposable elements as molecular markers have been designed. Their qualities such as abundance, general dispersion, and activity allow perfect conditions for developing molecular markers. By using retrotransposon sequences as molecular markers, many methods were developed as primers in the polymerase chain reaction (Kalendar, 2011;Kalendar et al., 2011). The inter-repeat amplification polymorphism techniques such as inter-retrotransposon amplified polymorphism (IRAP), retrotransposon microsatellite amplification polymorphisms (REMAP) or inter-MITE amplification have used abundant dispersed repeats such as the LTRs of retrotransposons and SINE-like sequences (inter-SINE amplified polymorphism -ISAP) (Bureau, Wessler, 1992;Wenke et al., 2011;Seibt et al., 2012), also called Alu-PCR or SINE-PCR (Charlieu et al., 1992).
Positive correlation was detected between the genome size of studied organisms and the efficiency of repeat-based amplification techniques. The larger the genome, the easier it is to develop good primers for revealing multiple bands for polymorphism detection (main cereals); organisms with small genome, such as Brachypodium dystachyon or Vitis vinifera, are the hardest examples for PCR marker development .
It has been proven that TE families evolve with different profiles, so TE marker systems based on different TEs show different levels of resolution and can be chosen to fit with the required analysis (Leigh et al., 2003;Smykal et al., 2009;Hosid et al., 2012). Retrotransposons insertions behave as Mendelian loci (Manninen et al., 2006;Tanhuanpaa et al., 2008). Thus, retrotransposon-based markers would be expected to be co-dominant and involve a different level of genetic variability, i. e. transposition events, then arbitrary markers systems such as RAPD or AFLPs, which detect polymorphism from simple nucleotide changes to genomic rearrangements. Depending on method and primer combinations polymorphism detection tools can further be expanded knowing nearby TEs that are found in  (head-to-head, tail-to-tail, or head-to-tail).
PCR primers from one species can be used on others because related species have phylogenetically related TE sequences (retroelements or transposons), in which scenario primers designed to conservative TE sequences are advantageous. Being scattered at whole chromosome, TEs often are mixed with other elements and repeats; thus, PCR fingerprints can be improved if combination of PCR primers is used.
Following retrieval of LTR sequences of a selected family of retrotransposon, their alignment can be made to find out the most conserved region in them. The related plant species have conservative regions in LTR for identical retroelement; therefore, conservative regions can be identified through alignment of a few LTR sequences from one species or mixture with sequences from the related species (Kalendar et al., 2004;Yin et al., 2013;Moisy et al., 2014). The conservative parts of retrotransposon regions are used for the design of inverted primers for long distance PCR, for cloning of whole element and also for IRAP techniques.
Most of the retrotransposon techniques are anonymous, producing fingerprints from multiple sites of retrotransposon insertion in the genome. All of them use the combination of a known retrotransposon sequence and a variety of adjacent sequences. Target for primers are usually designed for LTRs near to the joint, in domains that are conserved within families but that differ between families. Despite regions internal to the LTR containing conserved segments could be used for this purpose, commonly to minimize size of the target to be amplified choice falls on LTR. Primer facing outward from the left or 5′ LTR will necessarily face inward from the right, or 3′ LTR because of LTRs being direct repeats. Depending on the nature of the second primer, the inward facing primer will either not amplify a product, produce a monomorphic band, or will detect polymorphism resulting from a nested insertion pattern. By using infrequent cutting enzyme, removal of internal amplicon can be done. To simplify the process, transposon specific primer can be obtained from an internal sequence present only once per element for retrotransposons with relatively short LTRs. Also, simplified digestion and amplification protocols can be used for S-SAP (sequence-specific amplified polymorphism) with low copy number elements. S-SAP is a modified AFLP method based on BARE-1 retroelement. The core of this method is shredding genomic DNA by using two different enzymes that produce a template for the specific primer PCR: amplification between retrotransposon and adaptors ligated at restriction sites (usually MseI and PstI or any other restriction enzyme) using selective bases in the adaptor primer. Normally LTR regions is the site where primers are produced; however, in some cases it can take place at internal part of the element, such as polypurine tract (PPT) that is located internal to the 3′-LTR in retrotransposons.
Generally, compared to AFLP, S-SAP displays more polymorphism, more chromosomal distribution, and more codominance, but in order to provide sites for adapter ligation as in AFLP method, for S-SAP method restriction digestion of genomic DNA is required. False genotyping results could be caused by sensitivity of commonly used restriction enzymes to DNA methylation. When the same technique used for retrotransposons is applied to DNA transposons, it is named transposon display (TD) ( Van den Broeck et al., 1998). In the Oryza genus, Rim2/Hipa-TD produced highly polymorphic profiles with ample reproducibility within a species as well as between species (Kwon et al., 2005).
Generation of virtually unlimited number of unique markers are gained through combination of different LTR primers or with combinations with microsatellite primers (REMAP). Same primers depending if used alone or in combination produce completely different banding patterns, demonstrating most of IRAP/REMAP bands were derived from sequences bordered by other LTR or a microsatellite on one side, and by an LTR on the other. In general, more variable pattern was shown in REMAP than in ISSR; also, frequently, but not always, depending on LTR sequence, single priming PCR show less variability than IRAP pattern with primer combinations.
LTR amplification technique was derived to reach quick, robust and economic marker system for genotyping in plant breeding and marker-assisted selection (Tam et al., 2009). Genetically inherited retrotransposon families can serve as markers that can ultimately protect the rights of breeders. The pattern obtained will be related to the TE copy number, insertion pattern and size of the TE family. Amplification of a series of bands (DNA fingerprints) using primers homologous to these high copy number repeats is achievable because of association of these sequences with each other and produced markers are very informative genetic markers (Yin et al., 2013).
Transposon display has been used also as a sensitive method for detecting genomic copies of retrotransposons amidst retrotransposon cDNAs (Jaaskelainen et al., 1999), in detecting cDNA polymorphism and clonal differences resulting from retrotransposon activities or retrotransposon recombination after crossing-over Monden et al., 2014). LTR amplification technique displays is also efficient in examine genome structure and evolution in Solanaceous crop species, and in chromosome structure and transmission (Manetti et al., 2007(Manetti et al., , 2009Novakova et al., 2009;Park et al., 2012;Michael, 2014;Na et al., 2014;Tang et al., 2014).
Insertion polymorphism of active retrotransposon families (Rtsp-1 and Lib) was used for DNA fingerprinting in sweet potato (Ipomoea batatas). Constructed phylogenetic tree using these insertion sites showed strong correspondence with pedigree information, proving this method could be utilized for genetic diversity studies. Thus, without a need for whole genome sequence information genome-wide comparative analysis of active retrotransposon insertion sites is effective approach for DNA fingerprinting. This method could

Актуальные технологии
Вавиловский журнал генетики и селекции • 21 • 1 • 2017 Оценка разнообразия растений и изменчивости транскрипционной активности с использованием молекулярных маркеров на основе транспозонов facilitate development of cultivar diagnostic system based on PCR and determination of genetic relationships (Monden et al., 2014). Due to abundance of SINE repetitive sequences in almost all plant genomes, it can be effectively used for genotyping (Wenke et al., 2011;Seibt et al., 2012). Potato served as a sample plant to develop ISAP method, and it is also possible to apply this method on another species (Seibt et al., 2012;Wenke et al., 2015). Two selected SINE families, SolS-IIIa and SolS-IV, were shown to be highly but differently amplified in Solanaceae, Solaneae tribe, including wild and cultivated potatoes, tomato, and eggplant. Genome-wide distribution of SolS-IIIa and SolS-IV along potato chromosomes, which is the basis for genotype discrimination and differentiation of somaclonal variants by ISAP markers, was shown through fluorescent in situ hybridization (Seibt et al., 2012). Study of activity of retrotransposons in inter-and intraspecific hybrids between Solanum kurtzianum and S. microdontum observed that at morphological level intraspecific hybrids' genotypes remained same as their parents', while genotypes of interspecific hybrids have been altered. Analysis of genotypes showed mobility of both retrotransposons (Tto1 and Tnt1) used, ranging from 0 to 7.8 %. In comparison to their parental genotype, hybrids were epigenetically changed by demethylation in the vicinity of Tnt1 and Tto1, which correlates with the activity of retrotransposons. Those results indicate that retrotransposon activation can lead to genetic variability in tuber-bearing species of Solanum via hybridization (Paz et al., 2015).

Retrotransposons and transcriptional variability of genome of Solanaceous crop species
Two most important Solanaceae species from Solaneae tribe, Solanum tuberosum and S. lycopersicum, are almost fully sequenced with approximately 85 % of S. tuberosum genome and 95 % of S. lycopersicum genome being known to this day. Out of total genome size of 810.6 MB, 15.78 % (127,958,425 bp) of S. tuberosum has yet to be sequenced, and as for S. lycopersicum, 5 % (44,030,063 bp) out of 781.6 MB isn't still sequenced (Mehra et al., 2015). Many plant species contain repetitive elements in their genome, sometimes reaching 80 %, such as wheat. As for S. tuberosum and S. lycopersicum, ~49 % and ~60 % of their genome is comprised of repetitive elements, respectively. Among other genetic elements, there are 629,713 complex repetitive elements in S. tuberosum and 589,561 in S. lycopersicum, with both having chromosome 12 as most repeat rich. Among identified repeat families, DNA transposons and retrotransposons are included (Tang et al., 2014). There are increasing number of reports suggesting repetitive elements carry important functions in the genome, for instance, it was detected that they're abundant in gene-coding region. Many repetitive elements have been detected upstream of protein coding genes -regulatory regions. Some are found in introns, where they become exonized or domesticated. Most prevalent repetitive super family in both species is LTR/Gypsy, with 334,474 and 306,511 repetitive elements in S. tuberosum and S. lycopersicum, respectively. Following LTR Gypsy, other super families, such as LTR Copia and LINE elements L1, occupy much of both species' genome. Thus, LTR is the most abundant complex repetitive element in S. tuberosum and S. lycopersicum. Twelve chromosomes of both species are highly syntenic to each other, having high similarity for genes and repeats distribution (Mehra et al., 2015).
The study of genic regions has shown that, 99.29 % (38,740 genes out of 39,021 genes) of S. tuberosum genes had repeats overlapping either with their coding sequence or with the 5 kb upstream region, whereas in S. lycopersicum 98.92 % (34,303 genes out of 34,675 genes) genes had repeats overlapping with their genic regions and/or with the 5 kb upstream region. These results suggest that in both species repetitive sequences have impact on majority of protein coding genes. Analysis of various repeat families (LINE elements RTE-BovB, SINE elements and DNA transposon Stowaway) in S. tuberosum indicated big portion of repetitive elements are located within genic regions, as well as in S. lycopersicum, where the DNA transposons hAT-Tag1, hAT-Tip100, PIF-Harbinger, RC Helitron and SINEs were found in abundance in genic region, while genic region showed notable preference for LTR/ERV1 repeat family (Mehra et al., 2015).
Introns with repetitive elements have shown to impact the spatio-temporal expression of genes, creation of cryptic splices sites and other effects, while insertion of repetitive elements is thought to be more destructive, and associated with many disease conditions. Thus, detailed study of insertion of repetitive elements is crucial to further understand mechanism that lets repetitive elements influence genes and their products. In S. tuberosum, insertion preference of repeat families in either exonic or intronic regions wasn't noticed, whereas in S. lycopersicum DNA transposon MULE/MuDR and LTR/ ERV1 prevailed in exonic region and DNA transposons, TcMar-Stowaway, LINE elements RTE-BovB and SINE elements accumulated in intronic regions.

Epigenetic control and retrotransposon activity
Methylation status of TEs in plants was correlated with lower transcription of genes with TE insertions. Also, more systematic knowledge about the influence of stress or environmental cues on epigenetic control of retrotransposons as well as impact of TEs on phenotypic plasticity is still unclear (Hollister, Gaut, 2009). The stochastic and sometime incomplete nature of epigenetic silencing of retrotransposons may help explain stress surviving, heterosis and the genome dominance phenomenon for intraspecific cross hybrids. Repetitive element mobilization represents a destabilizing process for the host cell. Several mechanisms such as DNA and histone methylation and RNAi, actively suppress retrotransposon expression (Vetukuri et al., 2011). The epigenetic mechanisms controlling retroelements may well follow retrotransposons during their movement 'around' the genome and thereby modify the epigenetic control of retrotransposition targeted loci.
In the plant genome, insertional inactivation and other genome rearrangements lead to a wide spectrum of recombination and chromosomal instability (Belyayev et al., 2010). Retroelement-induced genetic rearrangements can lead to nonallelic homologous recombination or insertional mutagenesis due to the 'hopping' of retrotransposons within gene coding sequences; it causes diverse effects on target gene expression depending on intragenic location, orientation, length of the inserted sequence and other factors, or activation mobilization of small RNAs.
Studies have suggested repetitive elements cause speciation through regulatory variability. It was found that transcription factors (TFs) that partake in main metabolic pathways and defence response are associated with repetitive elements. In S. tuberosum, out of total binding sites of I-box, member of Myb-group of transcription factors, gained/lost in the 2 kb upstream region of the genes, around 36 % were found to be overlapping with repetitive elements. I-box promoter motif is considered to be involved in response mechanism to light and another TF SORLIP2 (sequences over-represented in light-induced promoters -SORLIPs) has been linked to light-induced genes in some plants, which was found to have significant gain of their TF binding sites in the orthologous genes of S. tuberosum with ~23 % of them occurring in repetitive region. Similarly, other TFs, such as G-Box and MADS, were associated with repetitive regions. Functions of these TFs include response to stress, light, abscisic acid, and other metabolites as well as taking part in development processes (flower development and gametophyte, embryo and seed development). Above examples indicate integration of repetitive sequences in plants is beneficial for their survival purposes, as opposed to initial speculations of them being "junks" (Mehra et al., 2015).
In addition, repetitive elements have been found to be associated with miRNAs, small non-coding RNAs responsible for regulation of 60-70 % genes in an organism. Examining multiple loci of miRNA in both species showed that most of them were overlapping with different repetitive elements, 242 and 77 miRNA in S. tuberosum and S. lycopersicum, respectively. Most prevalent repeat families were found to be LTR/ Gypsy in S. tuberosum, and DNA transposons in S. lycopersicum. By performing binomial test, probability of miRNA enrichment was calculated, where p-values of S. tuberosum (4.136e -10 ) and S. lycopersicum (1.819e -12 ) were obtained. Calculated numbers indicate miRNAs were enriched around repetitive elements.
Another small non-coding RNAs repetitive elements have been associated with are siRNA, which can silence repetitive elements through post transcriptional gene silencing mechanisms (PTGS) by creating feed-back-loop. Transcriptional activity of repetitive elements besides controlling repetitive elements, can as well provide tissue specific expression of certain genes (Mehra et al., 2015). A fair number of expressive repetitive elements have been linked to small RNAs/siRNA biogenesis, some of which are involved in gene regulation in either cis or trans manner. While some sRNAs partake in post transcriptional gene silencing, other such RNAs are involved in de novo DNA methylation in plant genome. With increasing number of reports, sRNAs are now thought to be core members of post transcriptional as well as RdDM based transcriptional gene regulatory processes. Involvement of repetitive elements in biogenesis of sRNAs indicate their importance in gene regulatory system of Solanum species.

Perspectives and implications
Many features of retrotransposons, such as ubiquity and dispersion in eukaryotic genome, make them appealing as the basis of molecular marker systems. Because of their repetitive nature, retrotransposons are a source of chromatin instability and genomic rearrangements with deleterious consequences (Belyayev et al., 2010). Newly inserted retrotransposons created instability and influence gene expression of flanking regions by modifying their methylation status. Retrotransposons can also impact gene regulation simply by inserting their own internal regulatory sequences (promoters, enhancers) in new genomic loci upon retrotransposition. A high proportion of the retroelements have lost their autonomous transposition ability, either by point mutations and/or deletions, many of them seem to embody defective elements with deletions.
Genome diversification results from their past activity and by recombination events, which provides a means of its detection. Their integration can be detected by conserved sequences. Retrotransposons are long and produce a large genetic change at the point of insertion, thereby providing conserved sequences that can be used to detect their own integration. This event isn't related to deletion of the transposable element from another locus, as it is for DNA transposons. Even the loss of the core domain of a retrotransposon by LTR-LTR recombination is invisible to the marker methods using outward-facing LTR primers. The ancestral state of a retrotransposon insertion is obvious -it is the empty site, which is very useful in pedigree and phylogenetic analyses. Original empty sites are unlikely to be regenerated by later recombination processes at a full site. Retroelements were used to clarify the relationships between related species.
Previously mentioned DNA markers based on LTR retrotransposons are usually referred to as "transposon display". The applications range from investigations of retrotransposon activation and mobility to studies of biodiversity, genome evolution, chromatin modification, epigenetic reprogramming, mapping of genes and the estimation of genetic distance, to assessment of essential derivation of varieties, detection of somaclonal variation and cDNA fingerprinting. Only those retrotransposon insertions are useful, which are passed into the egg cells and pollen. Thus, they could possibly be considered as sexually transmitted diseases, but that moves by a cellular, rather than extracellular, pathway into the new host.
The utility of LTR-retrotransposon-based markers, not only for genetic analysis and map construction, in addition also for the isolation and characterization of LTR retrotransposons, such as the long terminal repeats or the internal genes they contain.
In plants, analogous approaches have been adopted to the non-LTR retrotransposons, specifically to SINE elements. The insertion pattern of the human Alu, a SINE and the most prevalent transposable element in the human genome, were not only used for research on human population structure, but as well as in studies of heritable diseases. In essence, effective application of retrotransposon-or endogenous retrovirus-based molecular markers could be established for use on animals, including mammals and birds.
Platforms for commercial next-generation DNA sequencing techniques (NGS) of have been developed and found wide range of use for major crops, domestic animals, and humans. Abundance of sequence data is crucial for the development of new molecular markers. While genetic analysis by shotgun sequencing appears to be a promising method, cost is still the limit; therefore, cheap, generic, easily applied retrotransposon marker systems will stay as most applicable method for the foreseeable future.