![Page 1: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/1.jpg)
PhenoSpD: an integrated toolkit for phenotypic correlation estimation and multiple testing correction using GWAS summary statistics
Jie Zheng
The 12th International Conference on Genomics
27th Oct 2017
![Page 2: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/2.jpg)
One belt one road
Bristol
![Page 3: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/3.jpg)
An invitation from GIGA Science and BGI
![Page 4: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/4.jpg)
Phenome wide association study (PheWAS)• PheWAS analyzes many phenotypes
compared to a single or multiple genetic variant(s).
• PheWAS is common place, e.g.• MR-PheWAS. Millard et al, Sci Rep,
2015 • Haycock et al, JAMA Oncology, 2017
It is likely that longer telomeres increase risk for several cancers but reduce risk for some non-neoplastic diseases, including cardiovascular diseases.
![Page 5: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/5.jpg)
Post GWAS era: a database of harmonized GWAS summary data in MRC Integrative Epidemiology Unit in Bristol
![Page 6: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/6.jpg)
The network of post GWAS analysis software
Centralized Database
PhenoSpD MR-Base
LD Hub
![Page 7: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/7.jpg)
LD Hub for LD Score Regression
Univariate analysis: SNP heritability
0 20 40 60 80 100
02
46
810
LD Score
Ch
i squa
re
Bivariate analysis: genetic correlations
![Page 8: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/8.jpg)
LD Hub web app
![Page 9: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/9.jpg)
Scope of LD Hub
LD Hub
Database
233 publicly available
GWAS traits
Test Center:
On-the-fly LD score regression analysis pipeline
Lookup Center:
Existing LD score regression results
lookup
GWAShare Center:
Summary data sharing & user contribution
![Page 10: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/10.jpg)
MR-Base for Mendelian randomization
SNPs Trait 1
Confounders
Trait 2
Trait 1 = risk factor (exposure)
Trait 2 = disease (outcome)
![Page 11: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/11.jpg)
Two sample Mendelian randomization
![Page 12: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/12.jpg)
MR-Base web interface
![Page 13: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/13.jpg)
Scope of MR-Base
MR-Base
SNP lookups
12 two-sample MR
methodologies
MR-Base
R- package
Database ~2000 GWAS
(1100 with full data)
![Page 14: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/14.jpg)
PhenoSpD: why we need it? • Molecular phenotypes such as
metabolites are highly correlated.
• Multiple testing correction is a headache problem: Bonferroni correction is definitely over killed.
• When individual-level phenotype data is available, phenotypic correlation matrix can be calculated easily.
• However, in real world, phenotype data is normally not available.
• In MR-Base / LD Hub, we only have GWAS summary statistics.
• We need a magic hand to correct multiple testing!
Wurtz et al, J Am Coll Cardiol. 2013
![Page 15: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/15.jpg)
PhenoSpD: how it works
1. Harmonize GWAS summary statistics
2. Estimate phenotypic correlation matrix using metaCCA / LD score regression
3. Apply Spectral decomposition (SpD) to estimate the equivalent number of independent variables in the phenotypic correlation matrix
![Page 16: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/16.jpg)
MetaCCA• Summary statistics-based multivariate association
testing using canonical correlation analysis –Cichonska et al Bioinformatics 2016
• As a sub-product, it provides a way to estimate phenotypic correlation matrix 𝑌𝑌, which is equal to the Pearson correlation between regression coefficients (betas) of two GWASs
• The assumption is, both traits are from the same samples
• PS: 1000 Genomes is not the best option to estimate LD matrix between SNPs. See Benner et al AJHG 2017, and LDstore
![Page 17: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/17.jpg)
LD score regression
• Method to estimate SNP heritability and genetic correlations -- Bulik-Sullivan et al NG 2014, 2015
• It is also provides a way to estimate phenotypic correlations between two traits, which is the intercept term of the bi-variate LD score regression.
• Compare to metaCCA, it adjusted for sample overlap automatically
• Both genetic and phenotypic correlation matrixes can be found in LD Hub
![Page 18: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/18.jpg)
SNPSpD and MatSpD
• SNPSpD: A simple correction for multiple testing for SNPs in LD using spectral decomposition (SpD). Nyholt 2004 AJHG
• MatSpD: MatrixSpD, estimate the equivalent number of independent variables in a correlation (r) matrix
• The same method can be used to estimate the number of independent variables in a phenotypic correlation matrix
![Page 19: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/19.jpg)
Simulation• How accurate is the phenotypic correlation estimation using GWAS results?• Is there any parameters strongly affecting such estimation?
![Page 20: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/20.jpg)
Model N_ind_AN_ind_B N_overlap Overlap_% N_SNPs N_EnvF N_simu y1_y2_A_obs y1_y2_B_obs Mean_y1_y2_est SD_y1_y2_est Deviation_obs_est (%)
sample size 1 300 300 150 50% 1000 100 100 -0.70 -0.70 -0.46 0.56 34.1%sample size 2 500 500 250 50% 1000 100 100 -0.71 -0.70 -0.47 0.56 33.0%
sample size 3 1000 1000 500 50% 1000 100 100 -0.70 -0.70 -0.47 0.54 33.3%
sample size 4 3000 3000 1500 50% 1000 100 100 -0.70 -0.70 -0.46 0.54 33.6%
sample size 5 5000 5000 2500 50% 1000 100 100 -0.70 -0.70 -0.47 0.54 33.2%
sample size 6 10000 10000 5000 50% 1000 100 100 -0.71 -0.71 -0.47 0.54 33.9%
sample overlap 1 5000 5000 1000 10% 1000 100 100 -0.70 -0.70 -0.13 0.39 82.1%
sample overlap 2 5000 5000 2000 20% 1000 100 100 -0.70 -0.70 -0.23 0.47 67.2%
sample overlap 3 5000 5000 3000 30% 1000 100 100 -0.71 -0.71 -0.33 0.47 54.0%
sample overlap 4 5000 5000 4000 40% 1000 100 100 -0.71 -0.71 -0.40 0.51 43.2%
sample overlap 5 5000 5000 5000 50% 1000 100 100 -0.71 -0.71 -0.47 0.53 33.3%
sample overlap 6 5000 5000 6000 60% 1000 100 100 -0.71 -0.71 -0.53 0.59 25.2%
sample overlap 7 5000 5000 7000 70% 1000 100 100 -0.70 -0.70 -0.58 0.57 17.5%
sample overlap 8 5000 5000 8000 80% 1000 100 100 -0.70 -0.70 -0.62 0.65 11.4%
sample overlap 9 5000 5000 9000 90% 1000 100 100 -0.71 -0.71 -0.67 0.67 5.8%
unbalance sample 1 5000 5000 9000 90% 1000 100 100 -0.71 -0.71 -0.67 0.67 5.8%
unbalance sample 2 5000 6000 9000 82% 1000 100 100 -0.71 -0.71 -0.64 0.66 9.8%
unbalance sample 3 5000 8000 9000 69% 1000 100 100 -0.70 -0.70 -0.58 0.65 17.3%
unbalance sample 4 5000 10000 9000 60% 1000 100 100 -0.70 -0.70 -0.54 0.62 22.9%
unbalance sample 5 5000 13000 9000 50% 1000 100 100 -0.70 -0.70 -0.48 0.54 30.9%
number of SNPs 1 5000 5000 2500 50% 10 100 100 -0.70 -0.70 -0.44 0.73 38.1%
number of SNPs 2 5000 5000 2500 50% 100 100 100 -0.70 -0.70 -0.48 0.53 34.1%
number of SNPs 3 5000 5000 2500 50% 500 100 100 -0.70 -0.70 -0.47 0.53 34.3%
number of SNPs 4 5000 5000 2500 50% 1000 100 100 -0.71 -0.70 -0.47 0.56 33.5%
number of SNPs 5 5000 5000 2500 50% 5000 100 100 -0.70 -0.70 -0.47 0.55 33.6%
number of SNPs 6 5000 5000 2500 50% 10000 100 100 -0.71 -0.71 -0.47 0.59 33.7%
![Page 21: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/21.jpg)
Accuracy tests using real data
The estimated phenotypic correlations have good agreement with observed phenotypic correlations
The exceptions are traits with limited sample size (therefore limited sample overlap).
• Shin et al provided the observed phenotypic correlation matrix for 452 metabolites, which can be used as a test dataset
• So we compared the observed phenotypic correlation with the estimated phenotypic correlation using PhenoSpD.
![Page 22: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/22.jpg)
Growth importance of PhenoSpD• PhenoSpD is particularly useful for multiple GWASs from the same
samples, e.g. complex molecular traits such as metabolites and cytokines
• It can also be applied to all traits in MR-Base / LD Hub, which we can split traits into groups, e.g. all traits in GIANT consortium are highly possible to be correlated and majority of them are from the same sample
![Page 23: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/23.jpg)
Real case application in MR-Base and LD HubConsortium / First
author
Category N_traits N_SNPs N_correlations N_independent_traits
Kettunen Blood metabolites 123 9826292 7503 44.9
Shin Metaoblites 451 2482345 101475 324.4
Roederer Immune system phenotypes
151 1585187 11325 94.2
CARDIOGRAM 2 335391 1 1
TRICL 4 335391 6 3
TAG 4 1449634 6 3.98
SSGAC 7 1449634 21 6
PGC 4 335391 6 3.644
Leptin 2 1449634 1 1
MAGIC 16 1449634 120 11.098
IIBDGC 3 335391 3 2
Hrgene 8 1449634 28 7
HaemGen 6 1449634 15 5
GPC 6 1449634 15 5
GLGC 4 1449634 6 3
GIANT 15 1449634 105 10.1097
GEFOS 3 1449634 3 3
CKDGen 9 335391 36 8
EGG 4 1449634 6 4
GIS 2 2029112 1 1
GUGC 2 2449580 1 1
ENIGMA 7 7237736 21 6
UK Biobank 5 9440243 9 5
Others 24 / / 24
All 862 / 120713 577.3317
Number of independent traits in MR-BaseConsortium / First author
Category N_traits N_SNPs N_correlations N_independent_traits
All traits All traits 221 / 24310 134.1167
Number of independent traits in LD Hub
![Page 24: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/24.jpg)
Growth importance of PhenoSpD
• There is a great potential to apply PhenoSpD to multiple traits in large scale biobanks and cohorts such as UK Biobank, China KadoorieBiobank, HUNT study (all traits in one sample)
![Page 25: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/25.jpg)
UK Biobank release from Ben Neale’s group
• RAPID GWAS OF THOUSANDS OF PHENOTYPES FOR 337,000 SAMPLES IN THE UK BIOBANK (http://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of-phenotypes-for-337000-samples-in-the-uk-biobank)
• GWAS summary statistics of 337,000 European samples are available for over 2,400 human traits, everyone can access and download the results.
• ~600 traits are heritable, which are the most valuable data
![Page 26: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/26.jpg)
PhenoSpD application
• Assess the potential causal relationship between genetic variation, DNA methylation and 139 complex traits.
• PhenoSpD:
139 outcomes 62 independent outcomes
Hypothesis free MR of DNA methylation on 139 human traits
![Page 27: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/27.jpg)
Links for PhenoSpD• PhenoSpD Paper is on bioRxiv:
https://www.biorxiv.org/content/early/2017/07/25/148627
• R scripts of PhenoSpD can be found on MRC-IEU github:https://github.com/MRCIEU/PhenoSpD
• LD Hub: http://ldsc.broadinstitute.org/ldhub/
• MR-Base: www.mrbase.org
![Page 28: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/28.jpg)
Acknowledgements
• LD Hub team
• Jie Zheng
• David M Evans
• Benjamin Neale
• MR-Base team• Gibran Hemani
• Jie Zheng
• George Davey Smith
• Tom Gaunt
• Philip Haycock
• PhenoSpD team• Jie Zheng
• Tom Richardson
• Louise Millard
• Gibran Hemani
• Chris Raistrick
• Bjarni Vilhjalmsson
• Philip Haycock
• Tom Gaunt
![Page 29: Jie Zheng at #ICG12: PhenoSpD: an atlas of phenotypic correlations and a multiple testing correction for the human phenome](https://reader031.vdocuments.us/reader031/viewer/2022030318/5a6cfc337f8b9af2418b487f/html5/thumbnails/29.jpg)
Q & A
Thank you!
Questions welcomed