GWAS of a soybean breeding collection from South East and South Kazakhstan for resistance to fungal diseases

1 Институт биологии и биотехнологии растений Комитета науки Министерства образования и науки Республики Казахстан, Алматы, Казахстан 2 Казахский национальный аграрный университет, Алматы, Казахстан 3 Казахский научно-исследовательский институт земледелия и растениеводства, пос. Алмалыбак, Алматинская область, Казахстан 4 Научно-исследовательский институт проблем биологической безопасности Комитета науки Министерства образования и науки Республики Казахстан, пос. Гвардейский, Жамбылская область, Казахстан

Soybean (Glycine max (L.) Merr) is an essential food, feed, and technical culture. In Kazakhstan the area under soybean is increasing every year, helping to solve the problem of protein deficiency in human nutrition and animal feeding. One of the main problems of soybean production is fungal diseases causing yields losses of up to 30 %. Modern genomic studies can be applied to facilitate efficient breeding research for improvement of soybean fungal di sease tolerance. Therefore, the objective of this genomewide association study (GWAS) was analysis of a soybean collection consisting of 182 accessions in relation to fungal diseases in the conditions of South East and South Kazakhstan. Field evaluation of the soybean collection suggested that Fusarium spp. and Cercospora sojina affected plants in the South region (RIBSP), and Septoria glycines -in the South East region (KRIAPP). The major objective of the study was identification of QTL associated with resistance to fusarium root rot (FUS), frogeye leaf spot (FLS), and brown spot (BS). GWAS using 4 442 SNP (single nucleotide polymorphism) markers of Illumina iSelect array allowed for identification of fifteen marker trait associations (MTA) resistant to the three diseases at two diff er ent stages of growth. Two QTL both for FUS (chromosomes 13 and 17) and BS (chromosomes 14 and 17) were genetically mapped, including one presumably novel QTL for BS (chromosome 17). Also, five presumably novel QTL for FLS were genetically mapped on chromosomes 2, 7, and 15. The results can be used for improvement of the local breeding projects based on marker-assisted selection approach.

GWAS of a soybean breeding collection from South East and South Kazakhstan for resistance to fungal diseases
Генофонд и селекция растений Вавиловский журнал генетики и селекции • 2018 • 22 • 5 S oybean (G. max (L.) Merrill.) is one of the most important legumes in the world due to its high nutritional value and protein content (Masuda, Goldsmith, 2009). In Kazakhstan, this crop is mainly cultivated in the South East region. According to the Agency for Statistics of the Republic of Kazakhstan, in 2017 soybean was grown in an area of 137.4 thousand hectares (http://www.fcc.kz/attachments/ article/4325). For further development of the soybean industry, the Government of Kazakhstan has declared a new initiative to expand the soybean area to 400 thousand hectares by 2020 to ensure its yield at 1 million tons (Zatybekov et al., 2017).
The productivity of soybean largely depends on availability of well-adapted cultivars with approptiate flowering and maturity times to match various ecological environments of the country . Our previous study based on evaluation of 120 soybean accessions in three different regions of Kazakhstan (Abugalieva et al., 2016) has confirmed the results of observations in other parts of the world (Contreras-Soto et al., 2018;Copley et al., 2018), which underline the importance of suitable flowering time for plant adaptation in a particular environmental niche.
Another important factor that severely limits the soybean productivity worldwide is susceptibility to harmful diseases (Yang X.B. et al., 2001;Vidic et al., 2013). For instance, 25 known diseases posed a constant threat to the productivity of soybean in the USA (Mueller et al., 2010). In China, out of eight most common diseases, six are caused by fungi. In Russia, reports suggest there are up to 32 soybean diseases (Zaostrovnykh, 2005;Kurilova, 2010;Polozhieva, Dubovitskaya, 2015). In Kazakhstan, more than ten fungal diseases of soybean have been identified (Mombekova et al., 2013;Didorenko et al., 2014), and with expansion of the area under the crop, it is an obvious necessity to study the genetic background associated with the tolerance to harmful pathogens.
The damage caused by various diseases is determined by environmental conditions, the biology and spread of the parasite, and the characteristics of breeding material (Faske et al., 2014). Different parts of the plant, including seeds, sprouts, roots, shoots, leaves, and beans can be severely affected by these diseases. In this respect, all soybean diseases can be separated into three large groups: 1) diseases of seeds, sprouts, and seedlings; 2) patches that affect various parts of the plant; 3) diseases that cause the plants to wilt (Faske et al., 2014). In general, the total yield loss from susceptibility to fungal diseases can reach up to 40 % (Zaostrovnykh, 2005). J.K. Pataky and S.M. Lim (1981) reported that soybean yield loss due to S. glycines was associated with reduction of seed weight. M.D. Dias et al. (2016) identified a highly significant correlations ( p < 0.01) between yield and soybean Colletotrichum truncatum incidence on pods (r = -0.85). About 90 kg/ ha of soybean grain were lost for each 1 % increment in the disease incidence.
Currently, a large number of genes controlling the resistance to various diseases and pests have been identified (Prabhu et al., 1999;Wang J. et al., 2010;Vidic et al., 2013). Several soybean mapping populations were developed for genetic localization of QTL and genes associated with the soybean diseases, such as rhizoctonia root rot (RRR, caused by Rhizoctonia solani) (Zhao et al., 2005), fusarium root rot (FUS, caused by Fusarium spp.) (Stacey, 2008), phytophthora root rot (PRR, caused by Phytophthora sojae) (Zhang et al., 2013), frogeye leaf spot (FLS, caused by C. sojina) (Mian et al., 1999) and sclerotinia stem rot (SCL, caused by Sclerotinia sclerotiorum) (Zhao et al., 2015). The majority of these studies were based on use of SSR (simple sequence repeat) microsatellite markers. However, with the development of SoySNP50K iSelect SNP (single nucleotide polymorphism) array (Song et al., 2013), most of the modern studies rely on the use of SNP markers, which are crucial for genome-wide association studies (GWAS) (Klein, 2007). GWAS is based on use of whole genome genotyping and a detailed phytopathological and morphological description of collections with a high level of genetic diversity (Klein, 2007). A survey of recent reports has shown successful use of GWAS for studying soybean resistance to fungal diseases (Bao et al., 2015;Iquira et al., 2015;Schneider et al., 2016;Qin et al., 2017).
The purpose of this study was to assess the tolerance of the soybean germplasm collection represented by 182 accessions from major soybean growing regions from all around the world to most harmful diseases spreading in the South and South East of Kazakhstan. The GWAS was applied for identification of marker-trait associations for resistance to FUS, FLS and BS.

Materials and methods
The analyzed soybean collection consisted of 182 accessions, including 18 released cultivars and prospective breeding lines from Kazakhstan (Zatybekov et al., 2017). The accessions represented 12 countries from 5 geographic regions, including Western and Eastern Europe, North America, East Asia, and Kazakhstan. The collection was tested in the experimental plots of Research Institute of Biological Safety Problems (RIBSP, southern Kazakhstan) and Kazakh Research Institute of Agricultural Plant Production (KRIAPP, south-eastern Kazakhstan) in 2016-2017. Despite the environment similarities of the two localities, the conditions of soybean growth were different, as KRIAPP tested plants in irrigated, and RIBSP -in non-irrigated sites. Plants were grown in 1 meter long rows with a 30-cm distance between adjacent rows and a 5-cm gap between plants within rows. In total, the data for mean values of eight agronomic traits of the 182 soybean accessions harvested in two environments were subjected to further statistical analysis. The eight traits included the following data: days to seedling emergence (VE), days to flowering time (R2), days to development of pods (R4), days to full maturity of seeds (R8), plant height (PH), number of seeds per plant (NSP), thousand seeds weight (TSW), and yield per plant (YP).
Disease resistance analysis was carried out in relation to the three fungal diseases spread in the regions. In the South East the plants were analyzed for resistance to BS (caused by S. glycines), while in the South they were tested for resistance to FLS (C. sojina) and FUS (caused by a group of unidentified Fusarium pathogens indicated in this study as Fusarium spp.). The plant resistance to fungal diseases of the leaf surface was characterized based on a nine-point scale, where point 1 stood for highly resistant (no symptoms), 3 -for resistant (5-19 % foliage affected), 5 -for partially resistant (20-49 % of the foliage affected), 7 -for susceptible (50-79 % of the foliage affected), and 9 -for highly susceptible (up to 80 % of the foliage affected) (Hnetkovsky et al., 1996). The plant resistance to root rot was characterized based on a five-point scale, where point 1 stood for healthy root without infection symptoms, 2 -for slight cortical necrosis or vascular discoloration, 3 -for moderate cortical necrosis or vascular discoloration, 4 -for extensive cortical or vascular tissue distroyed, and 5 -for withering and dying of a plant (Leath, Carroll, 1982). The disease severity after infection as a percentage of the affected area and the healthy part of plants was noted as well. Therefore, the resistance to all three diseases were marked with number 1 (for instance, FUS1), and the diseases severity were marked as number 2 (for instance, FUS2). Statistical analyses of obtained data were calculated by using Statistical Package for the Social Sciences (SPSS 16.0) (https://www.ibm.com/ analytics/data-science/predictive-analytics/spss-statisticalsoftware) computer programs.
DNA samples were extracted and purified from single seeds of individual cultivars using commercial kits (Qiagene, CA, USA). The DNA concentration for each sample was adjusted to 50 ng/µl. All samples were genotyped using the soybean 5 403 SNP Illumina iSelect array (Song et al., 2013) at the Traitgenetics GmbH (Gatersleben, Germany). The Illumina Infinium procedure was performed according to the manufacturer's protocol. SNP genotype analysis was carried out using the Illumina Genome Studio software (GS V2011.1). Population genetic analysis and principal coordinate analysis were performed using GenAlEx 6.5 (Peakall, Smouse, 2012).
The SNP dataset was filtered using a 10 % cutoff for missing data and markers with minor allele frequency >0.10 were considered for GWAS. Numbers of hypothetical groups ranging from k = 1 to 10 were assessed using 50,000 burn-in iterations followed by 100,000 recorded Markov-Chain iterations. To estimate the sampling variance of population structure inference, five independent runs were carried out for each k by the STRUCTURE software (Pritchard et al., 2000). The output from STRUCTURE was analyzed for delta K value (ΔK) in STRUCTURE HARVESTER (Evanno et al., 2005). On the basis of the final k values, Q-matrix for three identified clusters was developed. GWAS for resistance to the most harmful fungal diseases of soybean in South East and South Kazakhstan were studied using 4,442 SNP filtered against minor alleles. GWAS based on the MLM model, including options with Q and K matrices, was conducted using the TASSEL 5 software (Bradbury et al., 2007). Pairwise LD between markers was measured using linkage disequilibrium parameter (r 2 ) between alleles using R studio (Wimmer et al., 2012). The LD decay rate was estimated as the chromosomal distance at which the average pairwise correlation coefficient (r 2 ) dropped to a half of its maximum value (Wimmer et al., 2012). The GWAS for resistance to diseases and spread of the diseases during the plant growth was run separately, and as the first evaluation was marked as FUS1, FLS1, and BS1, the evaluations for the spread of the diseases during the plant growth were marked as FUS2, FLS2, and BS2.

Diseases resistant
Field trial results at the experimental stations of the South East and South regions suggested a clear difference in the development of FUS, FLS, and BS on the leaf surfaces at the adult stage of the plant growth. While results at the RIBSP (South region) showed the occurrence of FUS and FLS, and luck of BS symptoms, the data at the KRIAPP (South East region) allowed the identification of only BS development (Fig. 1). At the RIBSP site, FUS showed stronger influence on plant growth as only 62.1 % of plants were resistant, whereas 77.4 % was resistant to FLS, and 100 % resistant to BS (see Fig. 1, a). At KRIAPP the collection showed 79.3 % resistance to BS with no signs of FUS and FLS throughout the plant growth period (see Fig. 1, b).
Three-way ANOVA suggested that the origin of the accessions was significantly associated with reducing of YP in relationship to the development of FUS and FLS in South Kazakhstan (Table 1). Among tested accessions, there were clear examples of association between the high resistance and yield. For instance, cultivar (cv.) 'Santana' from France was highly resistant to FUS and had high YP (8.7 ± 0.26 g), while cv. 'Chernovickaya 7' from Ukraine was highly susceptible to FUS with the YP of only 1.8 ± 0.15. The collection included ten accessions from East Asia, which showed complete resistance to all three diseases.

Phenotypic variation of the collection
Comparative assessment of five groups of samples in the studied soybean collection in two regions for 2016-2017 has not revealed sharp differences in the main agronomic traits. The varieties from North America and West Europe showed good potential in NSP, while the varieties from East Asia -the highest potential in TSW. Pearson correlation demonstrated that NSP negatively correlated with TSW (p < 0.001) and positively -with YP (p < 0.001). Overall highest average yield in the collection of 182 accessions for the two fields was recorded for the Supra variety from Canada (23.0 ± 3.86), followed by Cheremosh (18.7 ± 2.89) from Ukraine and Slaviya (19.3 ± 5.42) from Russia. Among the accessions from Kazakhstan, the Mysula variety showed the best YP (18.3 ± 3.45). It is interesting that in 2016, 14.8 % of the collection showed a higher yield result in comparison with standard variety Zhansaya, and in 2017 the number of higher yield accessions was even bigger (54.4 %).
Multivariate ANOVA suggested that FUS and FLS affected each studied trait both in RIBSP and KRIAPP, except FLS was not a factor for variation in TSW (see Table 1). On the other Полногеномный анализ ассоциаций с устойчивостью к грибным болезням в коллекции сои в условиях Юго-Восточного и Южного Казахстана hand, BS significantly affected the duration of plant growth at the VER8 stage, the only one out of the eight studied traits (see Table 1).

Genetic variation in the soybean collection based on SNP markers
Genotyping of the soybean collection using the Illumina iSelect SNP array revealed 5 403 successful SNPs ( The average density of SNP map was one marker per every 213 Kb. The LD decay curve at the threshold r 2 = 0.1 was 20 Kbp (Supplemental Fig. 1) 1 . The PCoA allowed to separate 182 accessions based on their breeding origin and were split into five geographically distinct groups (see Supplemental  Fig. 2). The smallest group was from East Asia (10 accessions), and the largest -from Eastern Europe (83 accessions, see Table 2). The PCoA analysis based on NeiP data showed that genotypes from two European groups were positioned separately from other three groups by the PCoA1 component (see Supplemental Fig. 2), while PCoA2 effectively separated the remaining three groups. The accessions from North America and Kazakhstan appeared to be the most close groups, while the accessions from East Asia were genetically more distant from the other four groups (see Supplemental Fig. 2).

Association mapping
A total of 9 SNPs for 15 MTAs at the two stages of plant growth were identified to be associated with the resistance to three fungal diseases (Table 3). For each MTA separate QQ plots were generated to validate the significance of the associations. In addition, the results were statistically validated using a t-test for identification of a false positive MTA. All 15 identified MTAs were significant after t-test application.
The results suggested that two MTAs were significant for resistance to FUS, five -for FLS, and two -for BS (see Table 3). Also, the physical position of each critical SNP marker was compared with positions of known QTLs (https://soybase. org/search/qtllist_by_symbol.php). In this study only two out of nine SNPs matched the positions of analogous QTL in a soybean genome (see Table 3). The largest number of SNP markers in identified MTAs were located in chromosome 2, where mainly QTLs for resistance to P. sojae were genetically mapped (Fig. 2). The analysis of genome physical locations of associated SNP markers revealed that 3 SNPs were part of CDS (coding DNA sequence) and remained 6 SNPs were located in intergenic regions (Table 4).
Each SNP in intergenic position was considered for possible functional annotation based on the actual proximity of nearby located genes. Note: df -degree of freedom. The F values are provided with significance level indicated by the asterisks. *** p < 0.001, ** p < 0.01, * p < 0.05, ns -not significant.

Discussion
The analysis of three diseases from two regions of Kazakhstan revealed strong environmental influence on plant tolerance to studied pathogens, as FUS and FLS were the factor in the South, and BS -in South East parts of the country, respective-ly. ANOVA suggested that FUS and FLS affected nearly all studied traits, except for TSW, which was a factor in case of FUS, but not in case of FLS (see Table 1). The test also suggested that the origin of the plant material was essential for NSP and YP in both FUS and FLS studies. A different outcome Полногеномный анализ ассоциаций с устойчивостью к грибным болезням в коллекции сои в условиях Юго-Восточного и Южного Казахстана was observed in BS analysis, as this disease affected plants only for the duration of the VER8 stage (see Table 1), which shows that the S. glycines did not affect plants in RIBSP and was barely important in KRIAPP. It is interesting that all ten studied accessions from East Asia and selected lines from Europe showed strong tolerance to all three diseases and could be directly involved in the breeding processes of soybean for resistance to studied fungi diseases.
On the other hand, the genetic study of the collection based on 4,442 polymorphic SNPs indicated a relatively close genetic relationship between the samples from Kazakhstan and North America, as PCoA test placed them together in left upper part of the graph (see Supplemental Fig. 2).
GWAS of the three diseases evaluated at the two stages of plant growth period allowed for identification of nine SNP markers associated with 15 MTAs (see Tables 3, 4). Two SNPs were identified in the GWAS of FUS on chromosomes 13 and 17 (see Fig. 2). The region on chromosome 13 matched with the well-known QTL (Fusarium lesion length 1-2) identified by M. Ellis et al. (2012). The authors had found that the region between the Satt160 and Satt149 markers was significantly associated with resistance to Fusarium graminearum. M. Kassem et al. (2006) reported that the Satt160 marker on chromosome 13 appeared to be a significant determinant of seed yield. It is interesting that this region was also associated with resistance to P. sojae . The region on the chromosome 17 has the same location with a QTL identified in GWAS for resistance to Fusarium virguliforme (Bao et al., 2015;Zhang et al., 2015). SNP marker Gm17.8109237 identified in this study has located approximately 4 Mb from SNP marker ss715611120_C_T identified by J. Zhang et al. (2015) and 6.7 Mb from SNP marker BARC-051665-11191 identified by Y. Bao et al. (2015).
The most significant amount of MTAs was found in GWAS for resistance to FLS (see Tables 3, 4). The SNP locations of identified five QTLs for FLS were mapped on chromosomes 2, 7, and 15, and did not match locations of the QTLs found in the previous study for resistance to this disease (Yang W. et al., 2001;Pham et al., 2015). A literature survey showed that one QTL for resistance to FLS matched the QTL previoulsy mapped on chromosome 13 (Pham et al., 2015), while another was positioned on chromosomes 16 (Yang W. et al., 2001). Therefore, the MTA found in this study presumably suggested that they are novel QTL for resistance to FLS.
In case of BS, the location of one out of two identified QTLs has matched the same region on chromosomes 14 with a QTL for resistance to sudden death syndrome (SDS) caused by F. virguliforme . A physical position of associated SNP Gm14.4811528 for this QTL was in proximity of candidate gene Glyma14g06580 (Schmutz et al., 2010). The annotation of the gene is suggesting that it is a serine/threonine protein kinase, which is a common genetic factor often involved in controlling soybean diseases resistance (Cook et al., 2012). However, the identified position of the second MTA on chromosome 17 has possibly been reported for the first time. Therefore, in this study soybean QTL for resistances to FLS and BS were presumably novel ones. As the analyzed population in two regions has shown different reaction to tested diseases, these findings underline the importance of studying a genetically diverse collection of a particular soybean growing in a certain environmental niche. Identified MTAs may facilitate the discovery of new genes for resistance to diseases and a better understanding of genotype × environment interaction patterns. Also, the size and level of genetic variation in the studied genetic panels appear to be critical for the positive outcome of GWAS-based pro jects. It has been demonstrated that experiments with a sample size less than 384 accessions (Gurung et al., 2014) and large LD blocks (Zanke et al., 2014) might lead to identification of false positive associations. On the other hand, in the study by M.K. Turner et al. (2017), it was shown that smaller panels might allow for detection of false negative associations that would not have been detected in more extensive panels (Oyiga et al., 2017). Therefore, the results of this study using relatively small soybean collection size (n = 182) may potentially relate to the above-mentioned findings by M.K. Turner and his coauthors.
With development of new genomic technologies, such as KASP (kompetitative allele-specific polymorphism) (Semagn et al., 2014), the designated SNP markers (see Tables 3, 4) for each of the identified MTAs for resistances to FUS, FLS and BS can be transformed into convenient types of DNA markers to enhance marker-associated selection projects in soybean. Thus, the results of this study are a further contribution to the genetics and breeding of soybean associated with resistance to main fungal diseases.