The functional insight into the genetics of cardiovascular disease: results from the post-GWAS study

Cardiovascular diseases (CVDs), the leading cause of death worldwide, generally refer to a range of pathological conditions with the involvement of the heart and the blood vessels. A sizable fraction of the susceptibility loci is known, but the underlying mechanisms have been established only for a small proportion. Therefore, there is an increasing need to explore the functional relevance of trait-associated variants and, moreover, to search for novel risk genetic variation. We have reported the bioinformatic approach allowing effective identif ication of functional non-coding variants by integrated analysis of genome-wide data. Here, the analysis of 1361 previously identif ied regulatory SNPs (rSNPs) was performed to provide new insights into cardiovascular risk. We found 773,471 coding co-segregating markers for input rSNPs using the 1000 Genomes Project. The intersection of GWAS-derived SNPs with a relevance to cardiovascular traits with these markers was analyzed within a window of 10 Kbp. The effects on the transcription factor (TF) binding sites were explored by DeFine models. Functional pathway enrichment and protein–protein interaction (PPI) network analyses were performed on the targets and the extended genes by STRING and DAVID. Eighteen rSNPs were functionally linked to cardiovascular risk. A signif icant impact on binding sites of thirteen TFs including those involved in blood cells formation, hematopoiesis, macrophage function, inf lammation, and vasoconstriction was found in K562 cells. 21 rSNP gene targets and 5 partners predicted by PPI were enriched for spliceosome and endocytosis KEGG pathways, endosome sorting complex and mRNA splicing REACTOME pathways. Related Gene Ontology terms included mRNA splicing and processing, endosome transport and protein catabolic processes. Together, the f indings provide further insight into the biological basis of CVDs and highlight the importance of the precise regulation of splicing and alternative splicing


Introduction
While the running data indicate that the prevalence of cardiovascular disease may vary among regions of the world, they remain one of the leading causes of death and health loss and a large proportion of their forms are shown to have a familial aggregation and high heritability (Smith J.G., Newton-Cheh, 2015;Roth et al., 2017;. The previous efforts led to the identification of candidate risk genes including the genes of renal homeostasis for Mendelian forms of abnormal blood pressure levels and several transcription fac tors (including NKX25, GATA4, TBX) for congenital septal (Kathiresan, Srivastava, 2012). However, the broad group of cardiovascular traits such as myocardial infarction/ische mia or coronary artery disease (CAD) show complex inheritance patterns, which suggest the collective and non-linear effects from multiple genetic and non-genetic factors. With the recent technological advances, the whole-exome sequencing (Li A.H. et al., 2017;Seidelmann et al., 2017;Khera et al., 2019) and genome-wide association studies, GWASs (in particular (Erdmann et al., 2018;Schunkert et al., 2018)) have been shown to be a powerful tool for discovering the genetic variation associated with cardiovascular risk. The outcomes from multiple GWASs and their meta-analysis completed during the past decade have been deposited in the catalogs, such as the catalog of published GWASs from The National Human Genome Research Institute (Buniello et al., 2019), the Coronary ARtery DIsease Genome-wide Replication And Meta-analysis, CARDIoGRAM (Preuss et al., 2010) plus The Coronary Artery Disease (C4D) Consortium and UK Biobank (Ge et al., 2017).
The genes related to regulating blood pressure, the tone and elasticity of the vascular wall, the inflammation process, the proliferation of vascular smooth muscle cells and the levels of low density lipoprotein cholesterol (LDC-C) are 'traditionally' involved in cardiovascular risk. Moreover, the GWA studies resulted in numerous loci for cardiometabolic risk factors such as plasma biomarkers of lipid metabolism, thrombosis, inflammation and metabolic status playing a role in risk analysis. In the case of long QT syndrome, in particular, fifteen candidate genes have been reported to date, including several genes for ion channels (Refsgaard et al., 2012;Arking et al., 2014). Notably, three candidates (KCNQ1, KCNH2, and SCN5A) account for approximately 75 % of cases (Wallace et al., 2019). And there are more than 150 suggestive loci estimated for CAD although only 46 from them have reached the genome-wide significance threshold (den Hoed et al., 2015). Interestingly, a considerable overlap has been shown between the risk genes for monogenic forms of CVDs and those generating an association signal in GWAS (Rau et al., 2015).
Together, these findings have developed a relatively comprehensible picture of the biology underlying the cardiovascular disease, but despite the advances, we are not able to analyze the functionality for the majority of the reported associated genetic loci. Among the reasons, there may be important limitation of GWAS for identifying risk genomic regions instead of risk genes and the non-coding localization of the majority of the susceptibility SNPs (Ward, Kellis, 2012). Current theories assume that so-called regulatory non-coding SNPs (rSNPs) seem to make the greatest contribution to the development of various multifactorial diseases including oncological and CAD as these are directly involved in the control of the gene expression levels.
One of the lessons learned is the growing need for the comprehensive post-GWAS analysis in order to translate the reported statistical association to uncover the causal variants amongst those in linkage disequilibrium (Mansur et al., 2018;Smith A.J.P. et al., 2018). Moreover, only some of GWASimplicated loci (i. e. sixteen of 46 validated loci for CAD) are also associated with 'classic' genes of risk, therefore showing the potential involvement of 'non-traditional' biological pathways in the disease (Smith J.G., Newton-Cheh, 2015).
Since the advances in next-generation sequencing technologies have provided an expanding amount of large-scale -omics datasets, the research strategies have started to focus on the integration of various genome-wide information layers (Huang S. et al., 2017) using functional genomics assays. One way to find putative functional variants is to detect regions with allele-specific binding of transcription factors or histone modifications, suggesting their different regulatory downstream role. ChIP-seq data will provide the snapshots of protein-DNA interactions allowing the analysis of sites with significant difference in signal between the alleles or allelic differences. The employment of transcriptome (RNA-seq) data will provide the snapshots of gene expression levels depending on the allele. The epigenome (iTEA) analysis (Meng et al., 2018) in combination with the regulatory sequence annotations, i. e. DNase-seq and ChIP-seq datasets (Cavalli et al., 2019), is beginning to be used to screen for the causal variants changing gene expression, including within GWAS-derived loci. However, the researchers have not come close to solving the issue of identifying the rSNPs at the genome-wide scale. The limitations are imposed by the incomplete experimen- tal data collected to date and some critical methodological problems. Notably, one of the major challenges has become the development of effective in silico (bioinformatic) approaches but these are relatively few in number to date and only the individual studies have been reported to elucidate the underlying mechanisms of CVDs (Gong et al., 2018;Roman, Mohlke, 2018).

67
To address the challenges, we have recently reported an effective bioinformatic approach that facilitates the systematic identification of functional non-coding variants from available genome-wide data (Korbolina et al., 2018). Our pipeline utilized multiple positional and functional criteria to reveal non-coding regulatory variants in the human genome and imputed curated GWAS association signals to select the potentially colorectal cancer-causal rSNPs within a 1 Kb window of genomic sequence centered at the GWAS-SNP. Initially, the regulatory properties of found rSNPs were shown on a number of human cell lines of different origins (HCT116, K562, MCF7). However, expression of tissue-specific transcription factors is suppressed in cell lines. For this reason, here we have adopted the list of 1361 regulatory SNPs from the said study. The data from 1000 Genomes Project (1000Genomes Project Consortium et al., 2015 were incorporated in the analysis to improve matching rSNPs with the phenotypic outcome that would be the risk of CVDs. Further, we tried to narrow the focus toward rSNPs that potentially result in a difference in predicted binding status of various transcription factors and performed the functional annotation of the targeted genes.

Materials and methods
Input data on non-coding regulatory variants in human genome. Regulatory SNPs from our earlier study (Korbolina et al., 2018) were used for input including the data on identified targeted genes. All of these were associated with the allelespecific binding of various TFs and allele-specific expression from raw data.
Genetics data. Genetic variants and allele information were retrieved from dbSNP150 (Database resources…, 2016) and four 1000 Genomes Project super populations (AFR, AMR, ASN and EUR) (1000Genomes Project Consortium et al., 2015. The GRCh37 annotation was used to map genetic variants to gene loci. Assessment of the SNP clustering with a distance measure. To map significant GWAS associations to novel functional variants here we implemented the data of 1000 Genomes Project. First, we extracted the data on 2500 individual haplotype-resolved human genomes from various super populations (AFR, AMR, ASN and EUR). We found 1361 variants from the 1476 input rSNPs reported earlier (Korbolina et al., 2018) within this genomic data. Next, we extracted the data on all SNPs within the transcribed genomic regions. The lists of rSNPs defined from 1000 genomes ('population' set of 1361 variant) and coding SNPs defined from 1000 genomes were consolidated. Each individual from 2500 genome samples was genotyped separately by each SNP from the consolidated list. The genotype data were turned to binary, where "0" represented the most frequent allele and "1" -the minor allele within the individual genotype. The resulting genotype data were set to the 2D matrix containing 2500 individual genomes and genotypes for all SNPs from the consolidated list. The initial matrix was then transformed to the distance matrix using XOR logic gate. Figure 1 shows an example of SNP distance measurement by XOR.
All the relative distances were normalized on the number of times the minor allele was found there in the genomic data of 2500 genomes. Based on the analysis of the resulting distances, we quantified the likelihood of the repeated recognition of rSNPs and the coding SNP within one genotype in human populations. Here we found that in total 773 471 coding SNPs ( p < 0.01) may be merely co-segregating markers for our input rSNPs.
Implementing GWAS data to interpret the input rSNPs functionality. Next, we examined GWAS index SNPs available up to date (May 2019). We used the 'cardio' signature for querying the GWAS Catalog (including 'heart', 'coronary artery disease', 'CAD', 'platelet', 'blood', 'blood cells', 'pressure', 'count', 'vessel', 'caliber', 'pulse', 'artery'). Next, we evaluated whether the input rSNPs or any of the corresponding coding markers lying within a 10 Kbp window of GWAS SNPs (Brodie et al., 2016) could be related to heart and vascular disease association signals (Suppl. A functional protein association network. STRING v 11 (Szklarczyk et al., 2019) was selected as the PPI database with a subset of 21 genes targeted by 18 rSNPs that were found to be associated with cardiovascular risk, as input (see Suppl. Table 2 for details of enrichment analysis).
Functional annotation by DAVID. DAVID Database for Annotation, Visualization and Integrated Discovery (Database resources…, 2016) was used to further interpret the same targeted genes with the default values set for all parameters. The outputs of DAVID functional annotation and clustering tools are given in the Suppl. Table 3.
In silico analysis of potentially affected TF binding sites. The sequence-based DeFine deep learning models were employed to predict the effects on transcription factor binding in K562 cells (the data are accessible online via the DeFine tool) and to rank 18 rSNPs identified for cardiovascular risk. The DeFine functional scores predict the transcriptional factor-DNA binding intensities and are appointed in the view of the differences between the reference sequence and the altered sequence, as reviewed in . The outputs including the maximum TF functional scores, the most likely candidate TFs and top 10 contact genes for each rSNP position are given in the Suppl. Table 4. R code. The R package, version 3.1.0, was used for data ana lysis. The custom-made Perl scripts employed are available upon request. The functional insight into the genetics of cardiovascular disease: results from the post-GWAS study

Investigating rSNP functional relevance to cardiovascular risk via GWAS associations
We analyzed 438 GWAS-SNPs with relevance for cardiovascular traits and defined eighteen candidate rSNP variants at a 10 Kbp window size (see Suppl. Table 1) including within the associated loci for coronary heart and artery disease (CAD, 3 loci), HLD cholesterol, QT interval, red blood cells and platelet cells traits. One interesting result was that ten GWASderived SNPs including the ones for phenotypical associations with systolic and diastolic blood pressure, pulse pressure, retinal arteriolar microcirculation and one for CAD entered the list of founded regulatory SNPs in the study. We considered the input rSNP targets (Korbolina et al., 2018), and any gene targeted to these ten GWAS-SNPs (initially from GWAS catalog) to be a candidate for mediating the association. Actually, only one cardio-related SNP out of eighteen was linked to more than one target gene: rs3744061 (MFSD11, JMJD6).

Functional annotation of the rSNP targeted genes
We further looked into the target genes to these eighteen cardio-vascular risk rSNPs as candidates for mediating the effects on CVDs. In our STRING enrichment analysis (Szklarczyk et al., 2019), 21 rSNP targeted genes had five predicted partners and these were shown significantly enriched in Kyoto Encyclopedia of Genes and Genomes (KEGG) spliceosome pathway, two REACTOME pathways (endosomal sorting complex required for transport and mRNA splicing) and 40 Gene Ontology terms including gene expression, RNA splicing, regulation of mRNA splicing, regulation of alternative mRNA splicing via spliceosome, protein transport to vacuole involved in ubiquitin-dependent protein catabolic process via the multivesicular body sorting pathway; ubiquitindependent protein catabolic process (see Suppl. Table 2 for the details of STRING enrichment analysis). Figure 2 shows the corresponding protein association network with evidence for association between the targets. Such an enrichment indicates that the input proteins are at least partially biologically connected as a group.
The same enriched groups were found by DAVID functional annotation tool (Huang D.W. et al., 2008), including mRNA splicing and mRNA 3′-splice site recognition, regulation of transcription and endosomal transport GO terms (see Suppl. Table 3). However, these processes could all be linked to the pathological features of CVDs, but the most notable is the group of targets that are associated with splicing and alternative splicing regulation (given in red nodes).

DeFine rSNP prioritization
The DeFine online tool revealed that eight rSNPs had positive functional scores meeting the pathogenic potential to enhance TF binding, and five -to weaken TF binding (see Suppl.  The network for rSNP targets has been expanded by additional 5 proteins (via the 'More' button in the STRING interface and default confidence cut-off ). The network contains 26 nodes with 24 edges (vs 10 expected edges, the disconnected nodes are hidden); enrichment p-value < 0.001. The legend inset at the right shows the various types of evidence for the predicted association and the enriched annotation term for the protein (by different colours of the nodes).
ten TFs were most strongly altered by 13 rSNPs according to DeFine models: TAL1, REST, NR1H2, SP2, RFX5, MXI1, PBX2, NFYB, ZNF274 and ZNF263. Nine out of ten of these TFs (with the exception of ZNF274) were the proteins with antibodies to which the immunoprecipitation was made by the original authors (as given in the Supplementary material, Korbolina et al., 2018). This would be expected in principle when examining certain regulatory regions.
In more detail, rs210962 (GWAS-derived) and rs7920217 functional variants shared the potential to weaken the binding of RE1 silencing transcription factor (REST) to the related genomic loci (the DeFine scores of -0.0778 and -0.0888, respectively). rs2270574 and rs8106212 within the GWAS loci for CAD and platelet trait, respectively, shared the association with SP2 transcription factor, but had the opposite effects on TF binding according to DeFine scores (0.05020 to enhance and -0.0973 to weaken the binding, respectively). And again, two rSNPs shared the effects on TF binding sites for T-cell acute leukemia protein 1 (TAL1) in this study: rs140492 within the locus for HDL cholesterol (DeFine score 0.0996 to enhance the TF binding was counted) and rs10445033 -within GWAS locus for red blood cell levels with the DeFine score -0.0891 to weaken the TF binding. Again, DeFine identified the same targeted genes (i. e. SNF8 for rs2270574) with additional potential targets in each rSNP case (ten top candidate genes are listed in the DeFine output data, see Suppl. Table 4).

Discussion
As has been mentioned in the Results, we have identified eighteen rSNPs with a functional relevance to CVDs that matched the GWAS loci for coronary heart and artery disease, HLD cholesterol, retinal arteriolar microcirculation, QT interval, red blood cells and platelet cells traits from this study (see Suppl. Table 1). The genome position of rSNP coincided with that for GWAS-derived SNP for ten identified variants out of eighteen. It was an interesting find as this was a relatively large part compared to our previously reported results for colorectal cancer (Korbolina et al., 2018) and cognitive disorders (Bryzgalov et al., 2018).
What we should like to mention is that a number of truly critical processes depend on the blood cells functionality and biological activities as has been widely demonstrated. Regarding erythrocytes, these include not only oxygen transport, but immune response (Astle et al., 2016), redox homeostasis (Kuhn et al., 2017) and regulation of vascular function (Helms et al., 2018;Rifkind et al., 2018). Moreover, several ex vivo studies on diabetes mellitus identified that red blood cells do act to mediate the development of endothelial dysfunction and cardiac injury (Yang et al., 2013;Zhou et al., 2018;Pernow et al., 2019). This means that any qualitative or quantitative deviations from the physiological ranges may be closely linked to the disease (Leal et al., 2018). The data suggest that the blood cell count and hematological parameters could be useful markers to improve the cardiovascular risk prediction; however, they have limited sensitivity (Mozos, 2015;Samman Tahhan et al., 2017;Lassale et al., 2018;Haybar et al., 2019). Interestingly, it was shown that the level of expression of some curated genes may independently aid in the prediction of heart failure prognosis when combined with neutrophilto-lymphocyte ratio (Wan et al., 2018). The risk for CVDs correlates well with platelet traits (Sloan et al., 2015;Vélez, García, 2015;Reinthaler et al., 2016;Gill et al., 2018), the initiation and progression of CAD in particular (Uysal et al., 2016). However, some have argued that shared genetic pathways linking blood cells with complex pathologies, including autoimmune diseases, schizophrenia, and CAD may be noncausal (Astle et al., 2016).
Still, our current knowledge of splicing regulation and alternative splicing in the heart is limited, but splicing analysis has emerged as an important line of research for the cardiovascular risk. The studies revealed that the regulation of splicing and alternative splicing events (reviewed in (van den Hoogenhof et al., 2016)) seem to play a causative role in heart development and cardiovascular disease. The promising therapeutic targets (Rexiati et al., 2018) have already been proposed. There is evidence that a significant number of alternate transcripts are increased in diseased hearts compared to controls, and can be involved in disease. Thus, abnormal splicing of apoptotic genes contributes to the pathogenesis of several CVDs including dilated and diabetic cardiomyopathy, atherosclerosis and heart failure as reviewed in (Dlamini et al., 2015). The dysregulation of cardiac splicing factors can also be sufficient to affect heart function and lead to disease. Thus, there is evidence that the decrease in RNA-binding motif protein 20 (RBM20) levels may be involved in dilated cardiomyopathy by providing input to splicing of at least several known target genes (Maatz et al., 2014). In the study, the target gene of the rs4360494 functional variant within the GWAS-derived locus for pulse pressure is the SF3A3 gene that encodes subunit 3 of the splicing factor 3a protein heterotrimeric complex. As is known, the splicing factor 3a plays an important role in U2 snRNP biogenesis and thus, pre-mRNA splicing (Krämer et al., 2005;Huang C.-J. et al., 2011). With respect to other pathological states, targeting the components of the spliceosome has full potential as a strategy for cancer treatment and prognosis (Lin, 2017;El Marabti, Younis, 2018;Martinez-Montiel et al., 2018). Thus, the data suggest that SF3A3 is involved in the p53 activation, the induction of cell cycle arrest and cell death in non-small cell lung cancer (Siebring-van Olst et al., 2017). The SRSF2 gene, encoding another splicing machinery component, could be used as a reliable prognostic factor in patients with hepatocellular carcinoma (Luo et al., 2017).
The role of endosomal system in the heart functioning and cardiovascular disease is described as critical, too , as endosomes contribute to control of cholesterol (LDL) plasma levels (Wijers et al., 2018), Ca 2+ homeostasis and protein trafficking (Curran et al., 2015) and play an important part in atherosclerosis risk (Cai et al., 2018). However, surprisingly little is known regarding the regulation of endosome-based protein trafficking in the heart. Thus, our results could serve as useful background for further research. Protein quality and control and ubiquitin-proteasome system, UPS (Gilda et al., 2016;Barac et al., 2017;Gilda, Gomes, 2017;Dorsch et al., 2019;Shukla, Rafiq, 2019) have also played an essential role in the initiation and progression of CVDs. In short, the findings suggest that UPS contributes to structural remodeling of the myocardium, ischemia-reperfusion injury and myocardial cell loss, important components of progressive heart failure. There is evidence for the non-degradative role as well, as the ubiquitination was shown to affect the important The functional insight into the genetics of cardiovascular disease: results from the post-GWAS study regulators of signaling pathways including those for the cell growth and apoptosis, DNA damage response, the innate immune response, endocytosis, and protein activity (reviewed partly in (Gupta et al., 2018)). Given that proteostasis is a dynamic multiple-step process involving complex molecular machinery, the deregulation at any stage could therefore be implicated in a wide variety of outcomes.
When addressing directly the list of targets for our 18 rSNPs that are likely relevant to cardiovascular function, a member of the VAV gene family -VAV1 is the target of the rs8106212 variant for platelet distribution. Overall, the VAV proteins are known to interact with the receptors on cell surface to activate various downstream biological pathways thus leading to the alterations of transcription. VAV1 is important in hematopoiesis, playing a role in T-cell and B-cell development and activation (Rodríguez-Fdez, Bustelo, 2019). Another candidate, the PIEZO1 gene targeted by rs10445033 is widely associated with hereditary human diseases (Alper, 2017) and tissue homeostasis (Zhong et al., 2018). The homotetramer of the PIEZO1 protein functions as a pore-forming subunit of a mechanically activated cation channel and contributes a lot to vascular biology and development (Li J. et al., 2014) including mechanistic signaling in endothelium (Albarrán-Juárez et al., 2018). Some studies suggest that the molecular events involved in the development of acute myocardial infarction may include MiR-103a microRNA expression in plasma and the subsequent regulation of the expression of PIEZO1 protein (Huang L. et al., 2013). However, the up-regulation of Piezo1 was de monstrated in rat model of heart failure (Liang et al., 2017).
Since the rSNPs do induce the variation in the gene expression, the significant rSNPs' effects on the binding affinity of a genomic locus for transcription factors can be a cause (Deplancke et al., 2016). There is relative evidence for the GWAS loci for complex diseases to be associated with not only ultimate changes in gene expression (Gallagher, Chen-Plotkin, 2018) but with the activity of various TFs (Harley et al., 2018). Here, we used the online classifier of variant pathogenicity, DeFine , to explore the functional effects of 18 rSNPs with the relevance to cardiovascular traits on the TF binding sites. DeFine classification approach employs the sequence-based deep learning models between the reference sequence and the altered sequence centered at the variant. The authors have shown that the given tool is capable to identify the causal non-coding variants within the reported GWAS loci for complex human diseases.
In this study, a significant functional impact on binding sites of thirteen TFs, including five genomic positions with potentially weakened and eight -with potentially enhanced TF binding, was found in K562 cells. Among ten identified TFs with maximum DeFine functional scores (see Suppl. Table 4), four TFs (NFYB, PBX2, SP2 and TAL1) were functionally reliable to CVDs when considering well-known roles. Thus, TAL1 is the erythroid differentiation factor that cooperates with various TFs to regulate hematopoiesis and normal differentiation of myeloid cells, and may also contribute to the process of malignant transformation (Vagapova et al., 2018). PBX genes encode homeodomain transcription factors, that were shown to determine the allele-specific phenotypic presentation of heart defects in mice and their loss resulted in the insufficient expression of both genes controlling the blood vessel widening and narrowing and finally led to persistent vasoconstriction by multiple pathways (McCulley et al., 2017). NFYB (nuclear transcription factor Y subunit beta) is a subunit of a highly conserved trimeric TF that binds with high specificity to CCAAT motifs in the promoter regions in a variety of genes. Interestingly, the evidence for Nf-y, SP family factors (that bind to GC-boxes) and PBX1 to cooperate was identified (Suske, 2017;Völkel et al., 2018).
It is important that the regulatory elements of genome, the distribution of TF binding sites, and the effects of rSNPs on the gene expression can be highly tissue or cell line specific . But since the tissue samples, in particular of the brain and heart, are very difficult to obtain from humans, it is not surprising that the approaches to genome-wide identifying of the functional variants were trained and available in cells and animal models first. It can be argued that genomewide studying of the functional effects of non-coding variants on transcription very often relies on the modeling of the cell type-specific binding of transcription factors to regulatory elements of genome. The interest in the field of reliable in silico methods is increasing, but there are only a few that have been more or less broadly implemented according to PubMed analysis . Moreover, the evidence suggests that the performance of different functional prediction tools varies by disease phenotype (Anderson, Lassmann, 2018) and thus may give contradictory statements.
Overall, to date, using GWAS associations seems the most common way to explore the non-coding variants in the terms of functionality. Still, a survey from published studies showed that this approach helps to interpret just a minor part of thousands of identified rSNPs (Cavalli et al., 2016(Cavalli et al., , 2019. Our results suggest that the reported analysis pipeline integrating the datasets from 1000 Genomes Project may serve as a general framework for future research and would eventually lead to investigation of novel functional variants within significant GWAS loci that confer human disease risk. Considering our previously obtained data for colorectal cancer (Korbolina et al., 2018) and a number of cognitive disorders (Bryzgalov et al., 2018), we have more evidence for the precise regulation of splicing mechanisms and alternative splicing to be among the key mechanisms underlying the effects of non-coding genetic variation on the phenotype including various pathological conditions.

Conclusions
Overall using GWAS associations seems the most commonly used way to explore the non-coding variants in the terms of functionality to date. Still, a survey from published studies showed that this approach helps to interpret just a minor part of thousands of identified rSNPs (Cavalli et al., 2016(Cavalli et al., , 2019. Our results suggest that the reported analysis pipeline integrating the datasets from 1000 Genomes Project may serve as a general framework for future research and would eventually lead to investigation of novel functional variants within significant GWAS loci that confer human disease risk. In consideration of our previously obtained data for colorectal cancer (Korbolina et al., 2018) and a number of cognitive disorders (Bryzgalov et al., 2018), we have got another evidence for the precise regulation of splicing mechanisms and alternative splicing to