Detection of CRISPR cassettes and cas genes in the Arabidopsis thaliana genome

The state of the art in the evolution of plant viruses allows the genetic foundations of antiviral immunity in higher (including the most important crops) plants to be categorized as one of the most pressing issues of genetics and selection. According to the endosymbiotic theory, mitochondria descended from alphaproteobacteria that had been absorbed but not degraded by the host cell. The discovery of CRISPR-Cas systems (clustered regularly interspaced short palindromic repeats (CRISPR)-associated proteins), which implement the adaptive immunity function in prokaryotes, raises the question whether such a mechanism of antiviral protection could be caught up by evolution and used by representatives of eukaryotes (in particular, plants). The purpose of this work was to analyze the complete sequences of nuclear, mitochondrial, and chloroplast genomes of Arabidopsis thaliana in order to search for genetic elements similar to those in CRISPR-Cas systems of bacteria and archaea. As a result, in silico methods helped us to detect a locus of regularly intermittent short direct repeats in the mitochondrial genome of A. thaliana ecotypes. The structure of this locus corresponds to the CRISPR locus of the prokaryotic adaptive antiviral immune system. The probable connection between the locus found in the mitochondrial genome of the higher plant and the function of adaptive immunity is indicated by a similarity between the spacer sequences in the CRISPR cassette found and the genome of Cauliflower mosaic virus affecting Arabidopsis plants. Sequences of repeats and spacers of CRISPR cassettes in Arabidopsis C24 and Ler lines are perfectly identical. However, the locations of the CRISPR locus in the mitochondrial genomes of these lines differ significantly. The CRISPR cassette in the Col-0 line was found to be completely broken as a result of four deletions and one insertion. Although cas genes were not detected in the mitochondrial genome of the studied Arabidopsis ecotypes, their presence was detected in the nuclear genome. Both cas genes and numerous CRISPR cassettes were found on all the five chromosomes in the nuclear genome of the Col-0 ecotype. The results suggest the existence of a system of adaptive immunity in plants, which is similar to the CRISPR immunity of bacteria and archaea.


Introduction
The acquisition of alphaproteobacteria (which subsequently gave rise to mitochondria) as endosymbionts by the archaeal host is now unquestionably accepted to be one of the most important events in the nascence of the eukaryotic cell (Archibald, 2015). In recent years, methods of phylogenomics provided fundamentally new data demonstrating the possibility of several evolutionary scenarios for the genesis of the eukaryotic cell, including "late" or "early" acquisition of mitochondria by the host cell (Poole, Gribaldos, 2014;Pittis, Gabaldon, 2016). The discovery of the CRISPR-Cas adaptive immunity system based on the phenomenon of RNA interference in a significant percentage of bacterial and archaeal species (Jansen et al., 2002;Mojica et al., 2005;Makarova et al., 2006;Barrangou et al., 2007;Lander, 2016) poses the question whether such a protective system may exist in eukaryotic mitochondria, organelles that have an obvious evolutionary relationship with their bacterial ancestors. In this regard, mitochondria of higher plants, which have an extremely large genome compared to the genomes of animals and yeast, are of particular interest.
The mitochondrial genome of plants is also characterized by unusual dynamism, which manifests itself as a high recombination rate caused by repetitive sequences (Gualberto, Newton, 2017). The recombination activity results in the formation of a set of subgenomic forms and high genomic variability even within the same species. Such changes in the genomic structure lead to the rapid evolution of the plant mitochondrial genome. Moreover, the mitochondrial genome of higher plants is tightly involved in horizontal gene transfer processes, where it can act as both a donor and an acceptor of the gene (Kleine et al., 2009;Zhao et al., 2018).
Another important feature of the mitochondrial genome of higher plants is the presence of species-specific sets of linear and circular plasmids in these organelles of many plant species studied in this regard. The composition of the sets within the species can vary significantly (for example, in fertile and sterile forms) (Esser et al., 1986;Thomas, 1986). The origin of mitochondrial plasmids is still unknown. Double-stranded plasmids are believed to be introduced into the cells of higher plants by a symbiotic or pathogenic pathway (Douce, Neuburger, 1989). This hypothesis is supported by the fact that mitochondrial linear plasmids are associated with a protein at their 5ʹ-ends that resembles the structure of some viral nucleic acids (Douce, Neuburger, 1989). Moreover, the detection of genes in linear plasmids S1 and S2 of maize mitochondria encoding viral-type nucleic exchange proteins speaks for their probable viral origin (Kuzmin, Levchenko, 1987;Kuzmin et al., 1988). In recent years, a significant progress has been made in the study of mitoviruses, viruses with the simplest RNA genome that specifically infect fungal mitochondria (Shahi et al., 2019). However, there is also evidence for the existence of plant mitoviruses, which are believed to have arisen as a result of horizontal transfer events of the corresponding genes from plant-infecting fungi (Marienfeld et al., 1997;Bruenn et al., 2015;Nibert et al., 2018). Thus, if to compare bacteria and plant mitochondria, it can be said that the latter, like prokaryotes, also badly needed the protection against infectious nucleic acids of viral and/or plasmid origin during evolution.
Nevertheless, data on the existence of a similar mechanism of protection against pathogenic DNA among representatives of eukaryotes have not been obtained until recently, with the exception of the single detection of a typical CRISPR locus on the mitochondrial plasmid of the higher plant Vicia faba in (Mojica et al., 2000). However, that study has gone nowhere in the search for cas genes in the mitochondrial, chloroplast, and nuclear genomes of this plant species. In addition, no data on the existence of genetic elements of the CRISPR-Cas immunity in the nuclear plant genome has been obtained thus far.
Considering the evolutionary origin of mitochondria and the plant mitochondrial genome structure, search for genetic elements similar to those of CRISPR-Cas systems of bacteria and archaea in the mitochondrial, chloroplast, and nuclear genomes of the model plant Arabidopsis thaliana has been attempted by in silico methods. Taking into account the high dynamism of the plant mitochondrial genome, the genome-wide analysis of the mitochondrial genomes of three A. thaliana ecotypes (C24, Ler, and Col-0) was carried out with the purpose of searching for elements presumably associated with adaptive CRISPR-Cas immunity.
To seek elements of CRISPR-Cas systems in genomes, the CRISPROne online service was used (Zhang, Ye, 2017). To determine the origin of the detected CRISPR spacers, a search through the NCBI BLAST database (http://blast.ncbi.nlm.nih. gov/Blast.cgi) with the default parameters for viral taxa was carried out. Cases of coincidence with the numbers of mismatches fewer than 3 nucleotides were subsequently selected.
Sequence alignment of CRISPR loci in the mitochondrial genomes of A. thaliana ecotypes was carried out with the programs Matcher (paired) (Rice et al., 2000) and MUSCLE (multiple) (Edgar, 2004). The analysis of CRISPR spacer similarity to the genomes of species-specific viruses was carried out as in (Mihara et al., 2016) (according to Virus-Host DB https://www.genome.jp/virushostdb/3702).

Results and discussion
To date, the CRISPR locus, upstream leader sequence, and cas genes have been convincingly shown to be the critical components of CRISPR-Cas systems in bacteria and archaea as a general matter (Jansen et al., 2002;Richter et al., 2012). Resting on the known evolutionary relationship between mitochondria and bacteria, we searched for elements of the CRISPR-Cas system in the mitochondrial genome of three ecotypes of A. thaliana using approaches and methods of bioinformatics that are widely used in studying CRISPR-Cas systems of prokaryotes nowadays (Jansen et al., 2002;Makarova et al., 2006Makarova et al., , 2015Grissa et al., 2007;Zhang, Ye, 2017;Couvin et al., 2018).
The context analysis of the complete mitochondrial genome sequence of A. thaliana (C24 and Ler ecotypes) revealed a site whose structure is fully consistent with the organization of CRISPR cassettes of prokaryotic origin. The features of the nucleotide organization of the CRISPR-like locus in the mitochondrial genome of these ecotypes are shown in Fig. 1, a. As seen from the data presented, the CRISPR cassette found in the plant mitochondrial genome is formed by three 20-bp perfect direct repeats separated by two spacer sequences of 42 bp and 33 bp, respectively. By contrast, the genome-wide analysis of the Col-0 ecotype mitochondrial DNA showed that the CRISPR cassette structure is completely broken there as a result of four deletions and one insertion in the repeat unit (see Fig. 1, b).
In our opinion, a noteworthy result of the analysis of the ecotype-specific features of the mitochondrial CRISPR cassette is the fact that the localization of the CRISPR cassette (and its damaged variant) in the mitochondrial genomes of Arabidopsis C24, Ler, and Col-0 lines varies significantly with the complete match of the succession of repeats and spacers (Fig. 2). Such changes in the localization of the CRISPR cassette in the mitochondrial DNA of the studied Arabidopsis ecotypes are most likely to result from intense rearrangements in the mitochondrial genome due to high recombination activity, which is characteristic of the mitochondrial genomes of higher plants (Gualberto, Newton, 2017).
A special search revealed the presence of numerous CRISPR cassettes in the nuclear genome of A. thaliana (Fig. 3). Their sizes and arrangement on chromosomes are presented in Supplementary Material 1 . The total number of spacers present in 110 nuclear CRISPR cassettes is 330. We have not performed a detailed analysis of the similarity of the spacers of nuclear CRISPR cassettes to the genomes of plant viruses.
The results of the analysis of spacer sequences in a CRISPR cassette localized in Arabidopsis mitochondrial DNA with reference to the database of plant viruses are summarized in Table 1. The detected spacers were found to contain sections of nonrandom homology to the genomes of three strains of cauliflower mosaic virus able to infect A. thaliana plants. Moreover, regions of homology to mismatching genome units of different strains of this virus were identified in individual spacers (data not shown).
Search of the mitochondrial genome of the A. thaliana C24, Ler, and Col-0 ecotypes detected cas genes neither in the sequences immediately adjacent to the CRISPR locus nor in the rest of the genome. By contrast, sequences of individual cas genes were found in the nuclear genome ( Table 2).
The in silico search of three Arabidopsis chromosomes (chromosomes 1, 2, and 3, respectively) made it possible to map the cas5 gene, which is part of the effector module of type I CRISPR-Cas systems according to the existing classification of CRISPR-Cas systems (Makarova et al., 2015;Koonin et al., 2017). The csm6 gene is located on the same chromosome 3 as the cas5 gene. This gene encodes RNAse III-A, associated with the CRISPR-Cas system and involved (in prokaryotes) in the implementation of immunity against phages through degradation of phage transcripts (Jiang et al., 2016). The csa5 gene, whose protein product is a universal component of type I-A CRISPR-Cas systems, was detected on chromosome 4 (Daume et al., 2014). This protein is believed to participate in the R-loop stabilization during the interference stage (Daume et al., 2014). Finally, chromosome 5 contains three regions of different lengths corresponding to the gene previously annotated in the Arabidopsis nuclear genome as DEDDh, which is a representative of the 3′→5′ exonuclease gene family involved in the metabolism of small noncoding RNAs (Chen et al., 2018). The attribution of this gene to the cas family may mean that its protein product can perform several functions in vivo, including plant protection from the nucleic acids of viruses and plasmids.
The reverse transcriptase (RT) genes associated with type I and III CRISPR-Cas systems were found to be represented on all the five Arabidopsis chromosomes by a significant number of copies (43 in total) (see Table 2). Presently, these enzymes are assigned a particularly important role in the functioning of type III CRISPR-Cas systems, which is to incorporate new spacers into the existing CRISPR cassette, both with the direct RT participation and with the participation of the RT Cas1 fusion protein (Silas et al., 2016;Toro et al., 2017).
Thus, for the first time ever, our search for elements of CRISPR-Cas systems in the mitochondrial and nuclear genomes of A. thaliana made it possible to detect the main genetic elements of prokaryotic adaptive immunity, including CRISPR loci and cas genes, in the genome of this plant. With   the exception of the CRISPR cassette found in the mitochondrial genome, the structural elements of the system are localized in the nuclear genome. In general, in accordance with the classification proposed in (Makarova et al., 2015;Koonin et al., 2017), the still incomplete list of genes (cas5, csm6, csa5, cd06127, and RT) associated with CRISPR-Cas immunity of prokaryotes found in the A. thaliana genome allows the system found in this plant to be assigned to class 1 systems, which have a multi-subunit effector module. However, during our study, we failed to find any structure corresponding to the characteristics of leader sequences of the prokaryotic type in the mitochondrial or nuclear genomes  of A. thaliana (Alkhnbashi et al., 2016). Our attempt to seek elements of CRISPR-Cas systems in the chloroplast genomes of A. thaliana Ler and Col-0 ecotypes was unsuccessful as well.
It is the first finding of such canonical elements of CRISPR-Cas systems of prokaryotic origin as CRISPR loci and cas genes in a higher eukaryote, namely, the model plant A. thaliana. Only a single CRISPR cassette, whose spacer sequences exhibit nonrandom homology with the cauliflower mosaic virus genome, was found in the mitochondrial genome of Arabidopsis (see Table 1). The ability of this virus to infect plants of the species under investigation is of special importance. The Arabidopsis lines studied are characterized by a significant difference between the C24 and Ler ecotypes in the genomic localization of the CRISPR locus. In the Col-0 ecotype, the CRISPR cassette structure is completely broken as a result of four deletions and one insertion in the region of direct repeats (see Fig. 1, b). This fact points to intense mitochondrial genome reorganization in higher plants, which manifest itself as a rapid occurrence of interline differences at the level of mitochondrial DNA.
From the evolutionary point of view, the possible existence of CRISPR-Cas immunity in plants seems quite justified since DNA-containing plant organelles -mitochondria and chloroplasts -are obviously attractive targets for viruses and plasmids of alien origin. Plant mitochondria are particularly vulnerable in this regard due to the existence of mitoviruses that attack this type of organelles (Marienfeld et al., 1997;Bruenn et al., 2015;Nibert et al., 2018) and the natural ability of plant mitochondria for DNA uptake (Koulintchenko et al., 2003). In general, with regard to the currently available data, it seems premature to form any hypotheses on the evolutionary origin of CRISPR-Cas system elements in the Arabidopsis genome. However, it should be noted that a lot of experimental data for reconstructing the scenarios of the origin and evolution of CRISPR-Cas systems in prokaryotes have already been reported (Koonin, Makarova, 2019). When trying to consider the issue of the origin of CRISPR-Cas systems in plants in the context of eukaryogenesis, the data may be useful (Koonin, 2015;Lopez-Garia et al., 2017). In this case, it should be taken into account that similar protection systems against pathogenic nucleic acids might have been present in both the alpha-proteobacterial symbiont that gave rise to mitochondria and the archaean host of the protomitochondrial endosymbiont. Notionally, in evolutionary terms the presence of such a protective system cannot be ruled out for the cyanobacterial ancestor of modern chloroplasts either.
At the current stage of Arabidopsis CRISPR-Cas systems research, we were unable to identify a set of prokaryotic cas genes whose products would form the adaptation and effector modules of a class 1 CRISPR-Cas system and thus support the stages of adaptation, expression, and interference (Koonin et al., 2017). With regard to the known data on the wide variety of CRISPR-Cas systems found in bacteria and archaea (Westra et al., 2016;Koonin et al., 2017;Koonin, Makarova, 2019), it is natural to expect that the plant adaptive immunity mechanism can differ significantly from that in prokaryotes. Therefore, Detection of CRISPR cassettes and cas genes in the Arabidopsis thaliana genome it is obvious that only the use of a comprehensive approach, including transcriptomics and proteomics techniques along with genomics ones, will allow getting a more complete idea of the genes and their protein products that form the plant adaptive immunity system.

Conclusions
For the first time ever, such elements of the CRISPR-Cas system of prokaryotes as CRISPR cassettes and cas genes were detected in silico in the genome of the higher plant A. thaliana. This finding can provide a staging ground for further detailed genomic, transcriptomic and proteomic studies of a wider set of plant species (including the most important crops) in addition to Arabidopsis in order to determine groups of genes whose expression may be associated with the activity of the natural adaptive plant immunity mechanism. The applied relevance of the expected scientific results on the molecular nature of adaptive plant immunity can hardly be overestimated.
Detection of CRISPR cassettes and cas genes in the Arabidopsis thaliana genome