Lake Baikal amphipods and their genomes, great and small

Endemic amphipods (Crustacea: Amphipoda) of Lake Baikal represent an outstanding example of large species flocks occupying a wide range of ecological niches and originating from a handful of ancestor species. Their development took place at a restricted territory and is thus open for comprehensive research. Such examples provide unique opportunities for studying behavioral, anatomic, or physiological adaptations in multiple combinations of environmental conditions and thus attract considerable attention. The existing taxonomies of this group list over 350 species and subspecies, which, according to the molecular phylogenetic studies of marker genes, full transcriptomes and mitochondrial genomes, originated from at least two introductions into the lake. The studies of allozymes and marker genes have revealed a significant cryptic diversity in Baikal amphipods, as well as a large variance in genetic diversity within some morphological species. Crossing experiments conducted so far for two morphological species suggest that the differences in the mitochondrial marker (cytochrome c oxidase subunit I gene) can potentially be applied for making predictions about reproductive isolation. For about one-tenth of the Baikal amphipod species, nuclear genome sizes and chromosome numbers are known. While genome sizes vary within one order of magnitude, the karyotypes are relatively stable (2n = 52 for most species studied). Moreover, analysis of the diversity of repeated sequences in nuclear genomes showed significant between-species differences. Studies of mitochondrial genomes revealed some unusual features, such as variation in length and gene order, as well as duplications of tRNA genes, some of which also underwent remolding (change in anticodon specificity due to point mutations). The next important steps should be (i) the assembly of whole genomes for different species of Baikal amphipods, which is at the moment hampered by complicated genome structures with high repeat content, and (ii) updating species taxonomy taking into account all the data.


Introduction
Ancient lakes are known speciation hotspots.However, even against this background, the biodiversity of Lake Baikal, the age of which is estimated as 25-30 or even 70 million years, stands out (Cristescu et al., 2010;Mats et al., 2011).The re presentatives of the order Amphipoda (Crustacea) constitute one of the largest groups of closely related species found in Baikal.
The diversity of amphipods in Baikal may be partially attri buted to the broad range of habitats and ecological niches they occupy, as the species within this group differ in habitat depth (0-1,642 meters), feeding habits, and reproductive periods (Takhteev, 2000a, b).However, many species share the same habitat, being at the same time similar in size, feeding spectra and reproductive periods (Takhteev, 2000a, b), which raises the question of the evolutionary forces that drove their speciation.Earlier reviews have already presented global conclusions about the origin of Baikal endemic fauna based on molecular data from multiple studies (Sherbakov, 1999;Sherbakov et al., 2017).However, the recent years have seen the accumulation of a lot of new data, especially highthroughput sequencing, which have uncovered new details on speciation and genome evolution in Baikal amphipods.

Morphological classification
Currently, the formal identification of Baikal amphipod species is based on the morphological criterium, i. e. the presence of a unique set of morphological traits in all studied individuals of a particular species.The number of morphological species and subspecies in Baikal exceeds 350 (Takhteev, 2000a;Ka maltynov, 2001;Takhteev et al., 2015).In the case of Baikal amphipods, subspecies were mostly derived from morphologi cal varieties that differed less than species would (Bazikalova, 1945;Takhteev, 2000a).All these species belong to the phy lum Arthropoda, subphylum Crustacea, class Malacostraca, order Amphipoda, and superfamily Gammaroidea (Sket et al., 2019).The numbers of subspecies, species, genera and families differ according to different authors (Takhteev, 2019), but the most evident discrepancies are attributed to differing taxonomic levels (subspecies/species, congeneric species/ different genera etc.).
Multiple classifications complicate studies in Baikal am phipods.From a practical point of view, the most important discrepancies for researchers are different generic names for the same species.The correspondence between the names suggested by different authors can be easily checked using the World Amphipoda Database (WAD; https://www.marinespecies.org/amphipoda/)(Horton et al., 2023).It is worth noting that the systematics accepted by WAD (Kamaltynov, 2001(Kamaltynov, , 2009) ) does not have an associated identification key, and thus many manuscripts use the species names indicated in the existing keys.The most comprehensive key for Baikal amphipods is still (Bazikalova, 1945), although some groups are covered in more detail in later sources (Bazikalova, 1962;Takhteev, 2000a).The only available English identification key for the genera of Baikal amphipods is provided by (Sket et al., 2019).An English language checklist of all known species according to the same classification is compiled in (Takh teev et al., 2015).However, none of the sources in clude the species described after 2000: Eulimnogammarus messerschmidtii Bedulina et Takhteev, 2014(Bedulina et al., 2014), Eulimnogammarus etingovae and Eulimnogamma rus tchernykhi Moskalenko, Neretina & Yampolsky, 2020(Mos kalenko et al., 2020).

Molecular genetics approaches to classification
Molecular phylogenetic studies in Baikal amphipods revealed three important conclusions.First, all studied species cluster within the freshwater radiation of the morphological genus Gammarus Fabricius, 1775 at the phylogenetic tree, which provides evidence of their descent from Gammaruslike freshwater ancestors (Macdonald III et al., 2005;Hou et al., 2014).Second, studies utilizing phylogenetic marker genes have shown that Baikal amphipods fall into two clades (Sherbakov, 1999;Macdonald III et al., 2005), indicating that their ancestors invaded the lake at least twice.This conclusion is supported by the phylogeny based on singlecopy ortho logs in transcriptomes (Naumenko et al., 2017) and whole ФИЛОГЕНЕТИКА / PHYLOGENETICS mitochondrial genomes (Romanova et al., 2016a).The first invasion gave rise to a much smaller number of recent species than the second invasion (Bazikalova, 1945;Naumenko et al., 2017).Third, several species of Baikal amphipods were found to exhibit cryptic diversity, i. e. the presence of genetically distinct groups that are morphologically indistinguishable or hard to distinguish.
Studies of allozyme spectra showed significant (in many cases species-level) differences within morphological species and led to suggestions to elevate some subspecies to species rank (Yampolsky et al., 1994;Väinölä, Kamaltynov, 1999) or, vice versa, synonymize (Daneliya et al., 2009).The dif ferences in allozyme frequencies may indicate the presence of isolated populations, but they are difficult to directly translate into species boundaries.This issue also affects the outcomes of phylogenetic marker sequencing, albeit to a lesser degree.In this case, species delimitation may rely on calculated threshold values of patristic distances (Lefébure et al., 2006) or other techniques that take into account genetic distances, phylogenetic tree topology or shared alleles (Fišer et al., 2018).However, the obtained sample clusters could not be safely assigned to biological species.Therefore, they are termed mo lecular operational taxonomic units (MOTUs) (Blaxter, 2004).
Folmer fragment of the cytochrome c oxidase subunit I gene (COI or cox1) is the most wellknown and frequently used marker sequence for amphipods and many other invertebrates (Folmer et al., 1994;Hebert et al., 2003).It is important to note that mitochondrial and nuclearbased phylogenies often produce conflicting results, which is known as mito-nuclear discordance (Toews, Brelsford, 2012).In order to draw reliable conclusions about separated genetic lineages, which would indicate reproductively isolated species, it is recommended to also employ nuclear markers.Popular nuclear markers include rRNA gene clusters as well as wholegenome markers such as ultraconserved elements (UCEs), restriction siteassociated DNA (RADs), and singlecopy orthologs (SCOs) (Eberle et al., 2020).From this list, SCOs have already been utilized to study Baikal amphipods (Naumenko et al., 2017;Drozdova et al., 2021); for other amphipods, RADs have also been used (Jordan et al., 2020;Weston et al., 2022;Eme et al., 2023).

Population genetic diversity
In total, intraspecies diversity has been studied using different methods and with varying geographical coverage for over 20 morphological species of Baikal amphipods (Supplemen tary Material 1) 1 .Some of these species showed substantial in traspecific diversity (Gomanenko et al., 2005;Daneliya et al., 2011;Gurkov et al., 2019).It is noteworthy that even species with comparable distribution and ecological charac teristics can exhibit dramatic differences in the level of in tra specific diversity (Fig. 1).For example, it was found that the species Eulimnogammarus verrucosus (Gerstfeldt, 1858), common in the littoral zone, is actually composed of at least three genetic lineages, inhabiting the western (up to the source of the Angara 1 Supplementary Materials 1-5 are available at: https://vavilovjicg.ru/download/pict202428/appx13.xlsxShown are representative photographs of each species at the same scale (grid size is 5 mm), along with split phylogenetic networks at the same scale (scale bar is 1 % substitutions, i. e. 5.1 substitutions in the 510bp alignment), and corresponding sampling points.Sequence data were obtained from the BOLD database (Ratnasingham, Hebert, 2007).Sampling coordinates were added or corrected based on the original publications (Fazalova et al., 2010;Petunina, 2015;Romanova et al., 2016a;Gurkov et al., 2019).Different colors on networks and maps correspond to different barcode index numbers (BINs) automatically determined by BOLD (Ratnasingham, Hebert, 2013).For detailed methodology, please refer to https://github.com/drozdovapb/Baikalamphipodsreviewpostchr2023.
Lake Baikal amphipods and their genomes, great and small river), southern and eastern parts of the Baikal shore (W, S, and E), respectively.Intraspecific pairwise differences in COI sequences reached 13 %, which is similar to the distances between morphological species (Gurkov et al., 2019).The most recent common ancestor of these lineages, according to a molecular clockbased estimate, existed around 4.5 mil lion years ago (Drozdova et al., 2022).A nuclear marker, 18S rRNA gene fragment, fully corroborated this divi sion (Gurkov et al., 2019).
Gmelinoides fasciatus (Stebbing, 1899) is another species common in the shallow water.It is also divided into genetic lineages correlated with geography, but here the differences are less pronounced, reaching about 8 % (Gomanenko et al., 2005), and the last common ancestor existed around 2 million years ago (Bukin et al., 2018).A nuclear marker, intron of the ATP synthase β subunit gene, showed a lower genetic diversity but also supported intraspecific differentiation (Kovalenkova, 2018).In contrast, preliminary data on the only pelagic plank tonic species of Baikal amphipods, Macrohectopus branickii (Dybowsky, 1874), based on the fragments of the mitochond rial genes COI and NADH dehydrogenase fifth subunit (ND5 or nad5) (Petunina et al., 2023;Zaidykov et al., 2023) did not reveal geographically separated genetic lineages.
Finally, Eulimnogammarus cyaneus (Dybowsky, 1874), another widely distributed species inhabiting a significant part of the Lake Baikal littoral, exhibits very weak genetic differentiation based on the COI fragment (Gurkov et al., 2019) but much more pronounced differentiation according to allozyme data (Mashiko et al., 2000).Furthermore, it is important to note that the borders between genetic lineages of E. verrucosus, such as the Angara river outflow, do not hold for Gm.fasciatus (Fig. 1, А, B); the geographic barriers for Gm.fasciatus are unclear.The source of the Angara river started to form at most 120,000 years ago (Arzhannikov et al., 2018), thus being much younger that the last common ancestor of E. verrucosus populations dwelling at different sides of the outflow (3.81 million years ago) (Drozdova et al., 2022).The current cryptic diversity within E. verrucosus and Gm.fasciatus appears to reflect past distribution barriers, such as dwelling in refugia during nonfavorable climatic conditions (Bukin et al., 2018).

Reproductive barriers and cryptic species
Reproductive isolation is crucial for biologically sensible spe cies delimitation.However, this issue has just recently started to be explored for Baikal amphipods.To date, experimental checks for reproductive incompatibility have only been carried out for two widely distributed littoral species, E. verrucosus and E. cyaneus.Crossing experiments were conducted with representatives of populations from Listvyanka (W) and Port Baikal (S) for both species (these populations were chosen due to the closest geographic proximity of different genetic lineages), and also from UstBargusin (E) for E. verrucosus.In the case of E. verrucosus, both prezygotic and postzygotic reproductive barriers were found.Although these barriers are not absolute, their combination can ensure reproductive isolation when different lineages are mixed.In the case of E. cyaneus, the analysis of representatives of the populations separated by the Angara river outflow did not show any prezy gotic or postzygotic barriers.Mate choice was random, and upon crossing, at least the first generation hybrids developed normally (Drozdova et al., 2022(Drozdova et al., , 2023)).Therefore, in the case of E. verrucosus and E. cyaneus, differences in COI sequences indeed correlate with the presence of reproductive barriers.However, it would be premature to establish a general rule for Baikal amphipods based solely on these findings.It is neces sary to conduct such experiments for other genera to draw comprehensive conclusions.Further research on reproductive barriers, as well as genomes and gene expression, may aid in comprehending the factors that contribute to reproductive incompatibility and thus serve as the genetic basis of spe ciation.
The next steps that need to be undertaken are renewal of the Baikal amphipod taxonomy and species redescription taking into account biological reality and possible competi tion between cryptic species.This necessity is not unique to Baikal, as cryptic species complexes without formal species descriptions are also characteristic of many other amphipods, including popular ecotoxicological models Gammarus fos sarum and Hyalella azteca (Jourdan et al., 2023).However, it underlines the critical importance of always specifying the particular sampling place for Baikal amphipods in every publication and identifying the genetic lineage whenever possible.

What is known about genomes of Lake Baikal amphipods?
The genetics of Baikal amphipods is a relatively under studied area, with most of the research focusing on individual genetic markers.Nuclear genome sizes have been estimated using cytogenetic methods such as Feulgen image analysis densitometry (FIAD) and flow cytometry (FCM) for 36 mor phological species (Jeffery et al., 2017;Drozdova et al., 2022).Karyotypes have been studied for 35 morphological species (Salemaa, Kamaltynov, 1994;Kamaltynov, 2001;Na tyaganova, Sitnikova, 2012;Barabanova et al., 2019) (Sup plementary Material 2).Transcriptome sequencing data are available for over 60 morphological species (Naumenko et al., 2017;Drozdova et al., 2022), enabling the extraction of most proteincoding gene sequences, as well as partial or complete mitochondrial genomes.These transcriptome assemblies are particularly valuable for proteomic studies (Bedulina et al., 2021;Zolotovskaya et al., 2021).Genome DNA sequenc ing data are available for seven species, which enabled the assembly of mitochondrial genomes and can be used to eva luate the diversity of repeated sequences in nuclear genomes (RivarolaDuarte et al., 2014;Romanova et al., 2016aRomanova et al., , 2021;;RivarolaDuarte, 2021;Yuxiang et al., 2023) (Supplementary Material 3).
Please refer to Supplementary Material 2 for full data set.Species names are given according to (Jeffery et al., 2017).The photographs show Baikalogammarus pullus (Dybowsky, 1874), which has the smallest genome and small body length, and dwells in the littoral and sublittoral zones, and Brachyuropus grewingkii (Dybowsky, 1874), which is a deepwater species and one of the largest.The ecological characteristics of these species are given according to (Kamaltynov, 2001).The photo of B. grewingkii was generously provided by Ekaterina Shchapova.
Genome Size Database (http://www.genomesize.com/)(Gre gory et al., 2007).When comparing data obtained using different methods, it is worth keeping in mind that crustacean genome size estimates obtained with FIAD are typically slightly lower than those obtained with FCM (Wyngaard et al., 2022).Notably, genome size differences accumulate quite rapidly, as evidenced by the differing genome sizes of E. verrucosus lineages (6.1 pg for the E, 6.9 pg for the W, and 8.0 pg for the S lineage) (Drozdova et al., 2022).The analysis of genome sizes in different species showed a weak positive correlation with both maximal body length and habitat depths, which corresponds to the known ecological trends (Jeffery et al., 2017).However, chromosome numbers were found to be identical (2n = 52) for 33 out of 35 studied species (Salemaa, Kamaltynov, 1994;Kamaltynov, 2001;Natyaganova, Sitnikova, 2012) (Fig. 2), which cor responds to the modal chromosome number for gammaroid amphipods (Coleman, 1994).The lack of correlation between chromosome numbers and genome sizes suggests that repeated sequences significantly contribute to this variation.Analysis of the diversity of repeated sequences revealed significant differences between species of Baikal amphipods (Yuxiang et al., 2023).In all studied species, the proportion of reads included in repeat clusters exceeded 50 % (RivarolaDuarte et al., 2014;Yuxiang et al., 2023).

Mitochondrial genomes
The mitochondrial genome is the most extensively studied part of the genome in Baikal amphipods.It is a small, highcopy DNA molecule, and its sequence is generally easy to assemble from lowcoverage genomewide sequencing (Smith, 2016).Animal mitochondrial genomes are typically circular with a length of about 16 kb and contain 13 proteincoding genes, 2 rRNA genes and 22 tRNA genes.However, significant dif ferences in genome architecture, size, and composition are known (Lavrov, Pett, 2016).
At the moment, eight complete and six partial mitochond rial genomes have been published for Baikal amphipods Lake Baikal amphipods and their genomes, great and small (Riva rola Duarte et al., 2014;Romanova et al., 2016aRomanova et al., -c, 2021;;Mamos et al., 2021) (Supplementary Material 4).Most of these assemblies are within 15-18 kb in length, but the mitochondrial genome of M. branickii is over 42 kblong, making it one of the largest known animal mitochondrial ge nomes (Romanova et al., 2021).Furthermore, mitochondrial genomes of some Baikal amphipods exhibit gene order rear rangements, gene duplications and the phenomenon of tRNA gene re molding, i. e. changes in tRNA specificity due to a mutation in the anticodon sequence.Remolding is not unique for Baikal amphipods but occurs with higher frequency than in other amphipods (Romanova et al., 2020).

Perspectives in whole-genome studies
The next important step in the development of genomewide studies of Baikal amphipods should be the assembly of whole nuclear genomes for a number of species.For the world am phipod fauna, seven genome assemblies are mentioned in the literature (Supplementary Material 5).Four of them (H.az teca, Trinorchestia longiramus, Platorchestia hallaensis, and Parhyale hawaensis) belong to the infraorder Talitrida (Kao et al., 2016;Poynton et al., 2018;Patra et al., 2020Patra et al., , 2021)).Three species belong to the infraorder Gammarida (Gammarus lacustris, G. roeselii, and E. verrucosus).One of these species, E. verrucosus, inhabits Baikal (Jin et al., 2019;Cormier et al., 2021;RivarolaDuarte, 2021).The genomes of gammarids are the largest within this list.Not surprisingly, creation of a highquality assembly of these genomes is complicated and currently at the draft stage, with N50 of all assemblies being below 5 kb, and only the genome of G. roeselii being publicly available.
The development of thirdgeneration genome sequenc ing techniques provides hope that technical difficulties in assembly of complex gammarid genomes can be overcome.For example, the assembly of the Antarctic krill, Euphausia superba, genome, which with 48 Gb is the largest assembled animal genome to date, demonstrates the potential of this technology (Shao et al., 2023).Highquality genome as semblies will greatly enhance the research on the adaptation mechanisms of endemic amphipods to various conditions in Lake Baikal and tracing their evolutionary history.This will be due to a wider range of possibilities for retrieving full gene sets (which is impossible with the current transcriptomic data) and regulatory elements, as well as new data on population history (Bourgeois, Warren, 2021) and higher resolution for phylogenetic analysis.

Fig. 1 .
Fig. 1.Comparison of the levels of population genetic diversity of the COI fragment within the best studied morphological species E. verrucosus (А), Gm. fasciatus (B) and E. cyaneus (C).