The GWAS-MAP platform for aggregation of results of genome-wide association studies and the GWAS-MAP|homo database of 70 billion genetic associations of human traits
https://doi.org/10.18699/VJ20.686
Abstract
Hundreds of genome-wide association studies (GWAS) of human traits are performed each year. The results of GWAS are often published in the form of summary statistics. Information from summary statistics can be used for multiple purposes – from fundamental research in biology and genetics to the search for potential biomarkers and therapeutic targets. While the amount of GWAS summary statistics collected by the scientific community is rapidly increasing, the use of this data is limited by the lack of generally accepted standards. In particular, the researchers who would like to use GWAS summary statistics in their studies have to become aware that the data are scattered across multiple websites, are presented in a variety of formats, and, often, were not quality controlled. Moreover, each available summary statistics analysis tools will ask for data to be presented in their own internal format. To address these issues, we developed GWAS-MAP, a high-throughput platform for aggregating, storing, analyzing, visualizing and providing access to a database of big data that result from region- and genome-wide association studies. The database currently contains information on more than 70 billion associations between genetic variants and human diseases, quantitative traits, and “omics” traits. The GWAS-MAP platform and database can be used for studying the etiology of human diseases, building predictive risk models and finding potential biomarkers and therapeutic interventions. In order to demonstrate a typical application of the platform as an approach for extracting new biological knowledge and establishing mechanistic hypotheses, we analyzed varicose veins, a disease affecting on average every third adult in Russia. The results of analysis confirmed known epidemiologic associations for this disease and led us to propose a hypothesis that increased levels of MICB and CD209 proteins in human plasma may increase susceptibility to varicose veins.
About the Authors
T. I. ShashkovaRussian Federation
Novosibirsk
D. D. Gorev
Russian Federation
Novosibirsk
E. D. Pakhomov
Russian Federation
Novosibirsk,
‘s-Hertogenbosch
A. S. Shadrina
Russian Federation
Novosibirsk
S. Zh. Sharapov
Russian Federation
Novosibirsk
Y. A. Tsepilov
Russian Federation
Novosibirsk
L. C. Karssen
Netherlands
‘s-Hertogenbosch
Y. S. Aulchenko
Russian Federation
Novosibirsk
References
1. Beck T., Shorter T., Brookes A.J. GWAS Central: a comprehensive resource for the discovery and comparison of genotype and phenotype data from genome-wide association studies. Nucleic Acids Res. 2020;8(48):D933-D940. https://doi.org/10.1093/nar/gkz895.
2. Benner C., Spencer C.C.A., Havulinna A.S., Salomaa V., Ripatti S., Pirinen M. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics. 2016; 32(10):1493-1501. https://doi.org/10.1093/bioinformatics/btw018.
3. Bulik-Sullivan B.K., Loh P.-R., Finucane H.K., Ripke S., Yang J., Schizophrenia Working Group of the Psychiatric Genomics Consortium, Patterson N., Daly M.J., Price A.L., Neale B.M. LD Score regression distinguishes confounding from polygenicity in genomewide association studies. Nat. Genet. 2015;47(3):291-295. https://doi.org/10.1038/ng.3211.
4. Bush W.S., Moore J.H. Genome-wide association studies. PLoS Comput. Biol. 2012;8(12):e1002822. https://doi.org/10.1016/B978-0-12-809633-8.20232-X.
5. Canela-Xandri O., Rawlik K., Tenesa A. An atlas of genetic associations in UK Biobank. Nat. Genet. 2018;50(11):1593-1599. https://doi.org/10.1038/s41588-018-0248-z.
6. Choi S.W., O’Reilly P.F. PRSice-2: Polygenic Risk Score software for biobank-scale data. GigaScience. 2019;8(7). https://doi.org/10.1093/gigascience/giz082.
7. del Rio Solá L., Aceves M., DueñasA.I., González-Fajardo J.A., Vaquero C., Crespo M.S., García-Rodríguez C. Varicose veins show enhanced chemokine expression. Eur. J. Vasc. Endovasc. Surg. 2009; 38(5):635-641. https://doi.org/10.1016/j.ejvs.2009.07.021.
8. Demirkan A., van Duijn C.M., Ugocsai P., Isaacs A., Pramstaller P.P., Liebisch G., Wilson J.F., Johansson Å., Rudan I., Aulchenko Y.S., Kirichenko A.V., … Meitinger T., Hicks A.A., Hayward C., DIAGRAM Consortium, CARDIoGRAM Consortium, CHARGE Consortium & EUROSPAN Consortium. Genome-wide association study identifies novel loci associated with circulating phospho- and sphingolipid concentrations. PLoS Genet. 2012;8(2):e1002490. https://doi.org/10.1371/journal.pgen.1002490.
9. Deng Y., Pan W. Improved use of small reference panels for conditional and joint analysis with GWAS summary statistics. Genetics. 2018;209(2):401-408. https://doi.org/10.1534/genetics.118.300813.
10. Elgaeva E.E., Tsepilov Y., Freidin M.B., Williams F.M.K., Aulchenko Y., Suri P. ISSLS Prize in Clinical Science 2020. Examining causal effects of body mass index on back pain: a Mendelian randomization study. Eur. Spine J. 2019;686-391. https://doi.org/10.1007/s00586-019-06224-6.
11. Evangelou E., Ioannidis J.P.A. Meta-analysis methods for genomewide association studies and beyond. Nat. Rev. Genet. 2013;14(6): 379-389. https://doi.org/10.1038/nrg3472.
12. Evans D.M., Visscher P.M., Wray N.R. Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. Hum. Mol. Genet. 2009; 18(18):3525-3531. https://doi.org/10.1093/hmg/ddp295.
13. Fabregat-Traver D., Sharapov S.Z., Hayward C., Rudan I., Campbell H., Aulchenko Y., Bientinesi P. High-performance mixed models based genome-wide association analysis with omicABEL software. F1000Research. 2014;3:200. https://doi.org/10.12688/f1000research.4867.1.
14. Folkersen L., Fauman E., Sabater-Lleal M., Strawbridge R.J., Frånberg M., Sennblad B., Baldassarre D., Veglia F., Humphries S.E., Rauramaa R., de Faire U., Smit A.J., Giral P., Kurl S., Mannarino E., Enroth S., Johansson Å., Enroth S.B., Gustafsson S., Lind L., Lindgren C., Morris A.P., Giedraitis V., Silveira A., Franco-Cereceda A., Tremoli E., Gyllensten U., Ingelsson E., Brunak S., Eriksson P., Ziemek D., Hamsten A., Mälarstig A. Mapping of 79 loci for 83 plasma protein biomarkers in cardiovascular disease. PLoS Genet. 2017;13(4):e1006706. https://doi.org/10.1371/journal.pgen.1006706.
15. Giambartolomei C., Vukcevic D., Schadt E.E., Franke L., Hingorani A.D., Wallace C., Plagnol V. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 2014;10(5):e1004383. https://doi.org/10.1371/journal.pgen.1004383.
16. GTEx Consortium et al. Genetic effects on gene expression across human tissues. Nature. 2017;550(7675):204-213. https://doi.org/10.1038/nature24277.
17. Hemani G., Zheng J., Wade K.H., Laurin C., Elsworth B., Burgess S., Bowden J., Langdon R., Tan V., Yarmolinsky J., Shihab H.A., Timpson N., Evans D.M., Relton C., Martin R.M., Smith G.D., Gaunt T.R., Haycock P.C. MR-Base: a platform for systematic causal inference across the phenome using billions of genetic associations. BioRxiv. 2016;18092. https://doi.org/10.1101/078972.
18. Howson J.M.M., Barnes D.R., Ho W.K., Young R., Paul D.S., Freitag D.F., Sun B.B., Lin W.Y., Surendran P., Di Angelantonio E., Chowdhury R., … Wang T.D., Rasheed A., Frossard P., Alam D.S., Majumder A.A.S. Fifteen new risk loci for coronary artery disease highlight arterial-wall-specific mechanisms. Nat. Genet. 2017; 49(7):1113-1119. https://doi.org/10.1038/ng.3874.
19. International Schizophrenia Consortium, Purcell S.M., Wray N.R., Stone J.L., Visscher P.M., O’Donovan M.C., Sullivan P.F., Sklar P. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460(7256):748-752. https://doi.org/10.1038/nature08185.
20. Kettunen J., Demirkan A., Würtz P., Draisma H.H.M., Haller T., Rawal R., Vaarhorst A., Kangas A.J., Lyytikäinen L.-P., Pirinen M., Pool R., … Raitakari O., Salomaa V., Slagboom P.E., Waldenberger M., Ripatti S., Ala-Korpela M. Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA. Nat. Commun. 2016;7:11122. https://doi.org/10.1038/ncomms11122.
21. KheraA.V., Chaffin M., Aragam K.G., Haas M.E., Roselli C., Choi S.H., Natarajan P., Lander E.S., Lubitz S.A., Ellinor P.T., Kathiresan S. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 2018;50(9):1219-1224. https://doi.org/10.1038/s41588-018-0183-z.
22. Khera A.V., Chaffin M., Wade K.H., Zahid S., Brancale J., Xia R., Distefano M., Senol-Cosar O., Haas M.E., Bick A., Aragam K.G., Lander E.S., Smith G.D., Mason-Suares H., Fornage M., Lebo M., Timpson N.J., Kaplan L.M., Kathiresan S. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell. 2019; 177(3):587-596. https://doi.org/10.1016/j.cell.2019.03.028.
23. Kichaev G., Yang W.-Y., Lindstrom S., Hormozdiari F., Eskin E., Price A.L., Kraft P., Pasaniuc B. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 2014;10(10):e1004722. https://doi.org/10.1371/journal.pgen.1004722.
24. Klarić L., Tsepilov Y.A., Stanton C.M., Mangino M., Sikka T.T., Esko T., Pakhomov E., Salo P., Deelen J., McGurnaghan S.J., Keser T., … Zoldoš V., Vitart V., Spector T., Aulchenko Y.S., Lauc G., Hayward C. Glycosylation of immunoglobulin G is regulated by a large network of genes pleiotropic with inflammatory diseases. Sci. Adv. 2020;6(8):eaax0301. https://doi.org/10.1126/sciadv.aax0301.
25. Klein R.J. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308(5720):385-389. https://doi.org/10.1126/science.1109557.
26. Lee A.J., Evans C.J., Allan P.L., Ruckley C.V., Fowkes F.G.R. Lifestyle factors and the risk of varicose veins: Edinburgh Vein Study. J. Clin. Epidemiol. 2003;56(2):171-179. https://doi.org/10.1016/s0895-4356(02)00518-8.
27. Lim C.S., Davies A.H. Pathogenesis of primary varicose veins. Br. J. Surg. 2009;96(11):1231-1242. https://doi.org/10.1002/bjs.6798.
28. Lloyd-Jones L.R., Zeng J., Sidorenko J., Yengo L., Moser G., Kemper K.E., Wang H., Zheng Z., Magi R., Esko T., Metspalu A., Wray N.R., Goddard M.E., Yang J., Visscher P.M. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun. 2019;10(1):5086. https://doi.org/10.1038/s41467-019-12653-0.
29. Mak T.S.H., Porsch R.M., Choi S.W., Zhou X., Sham P.C. Polygenic scores via penalized regression on summary statistics. Genet. Epidemiol. 2017;41(6):469-480. https://doi.org/10.1002/gepi.22050.
30. Mavaddat N., Michailidou K., Dennis J., Lush M., Fachal L., Lee A., Tyrer J.P., Chen T.H., Wang Q., Bolla M.K., Yang X., … Antoniou A.C., Chatterjee N., Kraft P., García-Closas M., Simard J., Easton D.F. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am. J. Hum. Genet. 2019;104(1):21-34. https://doi.org/10.1016/j.ajhg.2018.11.002.
31. Momozawa Y., Dmitrieva J., Théâtre E., Deffontaine V., Rahmouni S., Charloteaux B., Crins F., Docampo E., Elansary M., Gori A.S., Mariman R., … Tremelling M., Wei Z., Winkelmann J., Zhang C.K., Zhao H., Zhang H. IBD risk loci are enriched in multigenic regulatory modules encompassing putative causative genes. Nat. Commun. 2018;9(1):2427. https://doi.org/10.1038/s41467-018-04365-8.
32. Neale Lab. 2018. GWAS database available at http://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of-phenotypes-for337000-samples-in-the-uk-bioban.
33. Nikpay M., Goel A., Won H.-H., Hall L.M., Willenborg C., Kanoni S., Saleheen D., Kyriakou T., Nelson C.P., HopewellJ.C., Webb T.R., … McPherson R., Deloukas P., Schunkert H., Samani N.J., Farrall M., CARDIoGRAMplusC4D Consortium. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 2015;47(10):1121-1130. https://doi.org/10.1038/ng.3396.
34. O’Connor L.J., Price A.L. Distinguishing genetic correlation from causation across 52 diseases and complex traits. Nat. Genet. 2018; 50(12):1728-1734. https://doi.org/10.1038/s41588-018-0255-0.
35. Pers T.H., Karjalainen J.M., Chan Y., Westra H.-J., Wood A.R., Yang J., Lui J.C., Vedantam S., Gustafsson S., Esko T., Frayling T., Speliotes E.K., GIANT Consortium, Boehnke M., Raychaudhuri S., Fehrmann R.S.N., Hirschhorn J.N., Franke L. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 2015;6:5890. https://doi.org/10.1038/ncomms6890.
36. Satonaka H., Suzuki E., Nishimatsu H., Oba S., Takeda R., Goto A., Omata M., Fujita T., Nagai R., Hirata Y. Calcineurin promotes the expression of monocyte chemoattractant protein-1 in vascular myocytes and mediates vascular inflammation. Circ. Res. 2004;94(5): 693-700. https://doi.org/10.1161/01.RES.0000118250.67032.5E.
37. Schaid D.J., Chen W., Larson N.B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 2018;19(8):491-504. https://doi.org/10.1038/s41576-018-0016-z.
38. Schunkert H., König I.R., Kathiresan S., Reilly M.P., Assimes T.L., Holm H., Preuss M., Stewart A.F.R., Barbalic M., Gieger C., Absher D., … Roberts R., Thorsteinsdottir U., O’Donnell C.J., McPherson R., Erdmann J., Samani N.J. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 2011;43(4):333-338. https://doi.org/10.1038/ng.784.
39. Shadrina A.S., Sharapov S.Z., Shashkova T.I., Tsepilov Y.A. Varicose veins of lower extremities: insights from the first large-scale genetic study. PLoS Genet. 2019;15(4):e1008110. https://doi.org/10.1371/journal.pgen.1008110.
40. Shadrina A.S., Shashkova T.I., Torgasheva A.A., Sharapov S.Z., Klarić L., Pakhomov E.D., Alexeev D.G., Wilson J.F., Tsepilov Y.A., Joshi P.K., Aulchenko Y.S. Prioritization of causal genes for coronary artery disease based on cumulative evidence from experimental and in silico studies. Sci. Rep. 2020;10(1):1-15. https://doi.org/10.1038/s41598-020-67001-w.
41. Sharapov S.Z., Tsepilov Y.A., Aulchenko Y.S., Shadrina A.S., Klaric L., Vilaj M., Vuckovic F., Stambuk J., Trbojevic-Akmacic I., Kristic J., Simunovic J., Momcilovic A., Pucic-Bakovic M., Lauc G., Mangino M., Spector T., Williams F.M.K., Thareja G., Suhre K., Simurina M., Pavic T., Dagostino C., Dmitrieva J., Georges M., Campbell H., Dunlop M.G., Farrington S.M., Doherty M., Gieger C., Allegri M., Louis E. Defining the genetic control of human blood plasma N-glycome using genome-wide association study. Hum. Mol. Genet. 2019;28(12):2062-2077. https://doi.org/10.1093/hmg/ddz054.
42. Shen X., Klarić L., Sharapov S., Mangino M., Ning Z., Wu D., Trbojević-Akmačić I., Pučić-Baković M., Rudan I., Polašek O., Hayward C., Spector T.D., Wilson J.F., Lauc G., Aulchenko Y.S. Multivariate discovery and replication of five novel loci associated with immunoglobulin G N-glycosylation. Nat. Commun. 2017;8(1):447. https://doi.org/10.1038/s41467-017-00453-3.
43. Smetanina M.A., KelA.E., Sevost’ianova K.S., Maiborodin I.V., Shevela A.I., Zolotukhin I.A., Stegmaier P., Filipenko M.L. DNA methylation and gene expression profiling reveal MFAP5 as a regulatory driver of extracellular matrix remodeling in varicose vein disease. Epigenomics. 2018;10(8):1103-1119. https://doi.org/10.2217/epi-2018-0001.
44. Speed D., Balding D.J. SumHer better estimates the SNP heritability of complex traits from summary statistics. Nat. Genet. 2019;51(2): 277-284. https://doi.org/10.1038/s41588-018-0279-5.
45. Staley J.R., Blackshaw J., Kamat M.A., Ellis S., Surendran P., Sun B.B., Paul D.S., Freitag D., Burgess S., Danesh J., Young R., Butterworth A.S. PhenoScanner: a database of human genotype- phenotype associations. Bioinformatics. 2016;20(15):3207-3209. https://doi.org/10.1093/bioinformatics/btw373.
46. Suhre K., Arnold M., BhagwatA.M., Cotton R.J., Engelke R., Raffler J., Sarwath H., Thareja G., Wahl A., DeLisle R.K., Gold L., Pezer M., Lauc G., El-Din Selim M.A., Mook-Kanamori D.O., Al-Dous E.K., Mohamoud Y.A., Malek J., Strauch K., Grallert H., Peters A., Kastenmüller G., Gieger C., Graumann J. Connecting genetic risk to disease end points through the human blood plasma proteome. Nat. Commun. 2017;8:14357. https://doi.org/10.1038/ncomms14357.
47. Sun B.B., Maranville J.C., Peters J.E., Stacey D., Staley J.R., Blackshaw J., Burgess S., Jiang T., Paige E., Surendran P., Oliver-Williams C., Kamat M.A., Prins B.P., Wilcox S.K., Zimmerman E.S., Chi A., Bansal N., Spain S.L., Wood A.M., Morrell N.W., Bradley J.R., Janjic N., Roberts D.J., Ouwehand W.H., Todd J.A., Soranzo N., Suhre K., Paul D.S., Fox C.S., Plenge R.M., Danesh J., Runz H., Butterworth A.S. Genomic atlas of the human plasma proteome. Nature. 2018;558(7708):73-79. https://doi.org/10.1038/s41586-018-0175-2.
48. The 1000 Genomes Project Consortium, Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A., Abecasis G.R. A global reference for human genetic variation. Nature. 2015;526(7571):68-74. https://doi.org/10.1038/nature15393.
49. Timmers P.R., Mounier N., Lall K., Fischer K., Ning Z., Feng X., Bretherick A.D., Clark D.W., eQTLGen Consortium, Agbessi M., Ahsan H., Alves I., Andiappan A., Awadalla P., Battle A., Bonder M.J., Boomsma D., Christiansen M., Claringbould A., … Shen X., Esko T., Kutalik Z., Wilson J.F., Joshi P.K. Genomics of 1 million parent lifespans implicates novel pathways and common diseases and distinguishes survival chances. eLife. 2019;8:e39856. https://doi.org/10.7554/eLife.39856.
50. Vilhjálmsson B.J., Yang J., Finucane H.K., Gusev A., Lindström S., Ripke S., Genovese G., Loh P.-R., Bhatia G., Do R., Hayeck T., Won H.-H., Schizophrenia Working Group of the Psychiatric Genomics Consortium, DRIVE study, Kathiresan S., Pato M., Pato C., Tamimi R., Stahl E., Zaitlen N., Pasaniuc B., Belbin G., Kenny E.E., Schierup M.H., De Jager P., Patsopoulos N.A., McCarroll S., Daly M., Purcell S., Chasman D., Neale B., Goddard M., Visscher P.M., Kraft P., Patterson N., Price A.L. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 2015;97(4):576-592. https://doi.org/10.1016/j.ajhg.2015.09.001.
51. Visscher P.M., Wray N.R., Zhang Q., Sklar P., McCarthy M.I., Brown M.A., Yang J. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 2017;101(1):5-22. https://doi.org/10.1016/j.ajhg.2017.06.005.
52. Westra H.-J., Peters M.J., Esko T., Yaghootkar H., Schurmann C., Kettunen J., Christiansen M.W., Fairfax B.P., Schramm K., Powell J.E., Zhernakova A., … Ripatti S., Teumer A., Frayling T.M., Metspalu A., Van Meurs J.B.J., Franke L. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 2013;45(10):1238-1243. https://doi.org/10.1038/ng.2756.
53. Willer C.J., Schmidt E.M., Sengupta S., Peloso G.M., Gustafsson S., Kanoni S., Ganna A., Chen J., Buchkovich M.L., Mora S., Beckmann J.S., … Ripatti S., Cupples L.A., Sandhu M.S., Rich S.S., Boehnke M., Deloukas P., Global Lipids Genetics Consortium. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 2013;45(11):1274-1283. https://doi.org/10.1038/ng.2797.
54. Winkler T.W., Day F.R., Croteau-Chonka D.C., Wood A.R., Locke A.E., Mägi R., Ferreira T., Fall T., Graff M., Justice A.E., Luan J.A., Gustafsson S., Randall J.C., Vedantam S., Workalemahu T., Kilpeläinen T.O., Scherag A., Esko T., Kutalik Z., Heid I.M., Alavere H., Fischere K., Metspalu A., Mihailov E., Milani L., Morris A.P., Nelis M., Perola M., Tammesoo M.-L., Teder-Laving M., Loos R.J.F., GIANT Consortium. Quality control and conduct of genome-wide association meta-analyses. Nat. Protoc. 2014;9(5): 1192-1212. https://doi.org/10.1038/nprot.2014.071.
55. Wu X., Zhu X., Wu G.Q., Ding W. Data mining with big data. IEEE Trans. Knowl. Data Eng. 2013;26(1):97-107. https://doi.org/10.1109/TKDE.2013.109
56. Yang J., Lee S.H., Goddard M.E., Visscher P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 2011;88(1): 76-82. https://doi.org/10.1016/j.ajhg.2010.11.011.
57. Zhu Z., Zhang F., Hu H., Bakshi A., Robinson M.R., Powell J.E., Montgomery G.W., Goddard M.E., Wray N.R., Visscher P.M., Yang J. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 2016;48(5):481-487. https://doi.org/10.1038/ng.3538.
58. Zolotukhin I.A., Seliverstov E.I., Shevtsov Y.N., Avakiants I.P., Nikishkov A.S., Tatarintsev A.M., Kirienko A.I. Prevalence and risk factors for chronic venous disease in the general Russian population. Eur. J. Vasc. Endovasc. Surg. 2017;54(6):752-758. https://doi.org/10.1016/j.ejvs.2017.08.033.