qrank: quantile regression for eqtl analysis

12
QRank: Quantile Regression for eQTL Analysis Gen Li Department of Biostatistics, Columbia University March 27, 2017 Gen Li (Columbia Biostatistics) QRank March 27, 2017 1 / 11

Upload: others

Post on 02-Feb-2022

14 views

Category:

Documents


0 download

TRANSCRIPT

QRank: Quantile Regression for eQTL Analysis

Gen Li

Department of Biostatistics, Columbia University

March 27, 2017

Gen Li (Columbia Biostatistics) QRank March 27, 2017 1 / 11

Motivation

Expression quantitative trait loci (eQTLs) analysis can elucidategenetic regulatory pathways of complex diseases

Most eQTL studies focus on identifying mean effects

Linear regression models are commonly used

However, genetic variants may affect the entire distribution of geneexpressions

This distributional heterogeneity is understudied

Gen Li (Columbia Biostatistics) QRank March 27, 2017 2 / 11

Motivation

Expression quantitative trait loci (eQTLs) analysis can elucidategenetic regulatory pathways of complex diseases

Most eQTL studies focus on identifying mean effects

Linear regression models are commonly used

However, genetic variants may affect the entire distribution of geneexpressions

This distributional heterogeneity is understudied

Gen Li (Columbia Biostatistics) QRank March 27, 2017 2 / 11

Examples of Distributional Heterogeneity

0.0 0.2 0.4 0.6 0.8 1.0

-2-1

01

2

Gene: ENSG00000196932.7 & SNP: 10_63342861_C_T_b37

τ

Qua

ntile

nor

mal

ized

gen

e ex

pres

sion

SNP=0SNP=1SNP=2

0.0 0.2 0.4 0.6 0.8 1.0

-2-1

01

2

Gene: ENSG00000119943.6 & SNP: 10_100122640_C_G_b37

τ

Qua

ntile

nor

mal

ized

gen

e ex

pres

sion

SNP=0SNP=1SNP=2

P-values:LR: 6.11e-17CLS: 0.593QRBT: 6.25e-14

0.0 0.2 0.4 0.6 0.8 1.0

-2-1

01

2

Gene: ENSG00000259917.1 & SNP: 15_34743556_G_A_b37

τ

Qua

ntile

nor

mal

ized

gen

e ex

pres

sion

SNP=0SNP=1SNP=2

P-values:LR: 6.74e-08CLS: 0.0155QRBT: 4.46e-11

P-values:LR: 6.74e-08CLS: 0.0155QRBT: 4.46e-11

0.0 0.2 0.4 0.6 0.8 1.0

-2-1

01

2

Gene: ENSG00000206503.7 & SNP: 6_29880050_T_C_b37

τ

Qua

ntile

nor

mal

ized

gen

e ex

pres

sion

SNP=0SNP=1SNP=2

P-values:LR: 0.00198CLS: 2.42e-05QRBT: 6.43e-11

Gen Li (Columbia Biostatistics) QRank March 27, 2017 3 / 11

Quantile Regression for eQTL Discovery

In a particular tissue with n samples, for i ∈ {1,2, ...,n},Yi,k is the expression level of the k th gene in the i th sample

xi,j is the j th SNP within ±1MB of the TSS of the gene in the i thsample

z i is a covariate vector for the i th sample

For each gene-SNP pair k and j , we assume the quantiles of Yi,kfollow the model:

QYi,k (τ |z i , xi,j) = z iαjk ,τ + xi,jβjk ,τ ,

Goal: identify the k and j pairs for βjk ,τ 6= 0 for any given τ ∈ (0,1)

Gen Li (Columbia Biostatistics) QRank March 27, 2017 4 / 11

QRank Test

Define the rank-score function for a fixed quantile τ as

Sn,τ = n−1/2n∑

i=1

ρτ{yi,k − z iα̂jk ,τ}x∗i,j

ρτ{u} = τ − I(u < 0) is an asymmetric sign function

α̂jk ,τ is the estimated coefficient vector under the null H0 : βjk ,τ = 0

x∗i,j is the genotype data adjusted by the covariates

Sn,τ is close to zero if and only if βjk ,τ = 0

Gen Li (Columbia Biostatistics) QRank March 27, 2017 5 / 11

QRank Test

Test statistic at a fixed quantile τ :

Tn,τ = S2n,τ/Vn −→ χ2

1, as n→∞

where V−1n is the variance of Sn,τ such that

Vn = n−1τ(1− τ)x∗Tj x∗

j

Composite test statistic (τ1, · · · , τ`):

Tn,` = STn,`Σ

−1n,`Sn,` −→ χ2

` , as n→∞

where Sn,` = (Sn,τ1 , · · · ,Sn,τ`) and Σn,` has an explicit expression

Gen Li (Columbia Biostatistics) QRank March 27, 2017 6 / 11

Numerical Results

Compare QRank (with 5 quantile levels) with LR and CLS1

Type I error and power analysis in simulation

GTEx v6 data analysis (in 4 tissues)

Gene-level discoveries, tissue specificity, GWAS enrichment

1Brown et al., Elife (2014)Gen Li (Columbia Biostatistics) QRank March 27, 2017 7 / 11

eQTL Discoveries (Unique Genes, FDR=0.05)

Gen Li (Columbia Biostatistics) QRank March 27, 2017 8 / 11

Tissue Specificity

eQTL sharing coefficient πij = P(eQTL in tissue i| eQTL in tissue j)

Gen Li (Columbia Biostatistics) QRank March 27, 2017 9 / 11

GWAS Enrichment

Gen Li (Columbia Biostatistics) QRank March 27, 2017 10 / 11

Acknowledgement

Collaborators:Xiaoyu SongZhenwei ZhouXianling WangIuliana Ionita-LazaYing Wei

Funding:NSF (DMS-120923)NIH (R01HG008980)Calderone Junior Faculty AwardMSPH, Columbia University

Song et al. “QRank: A novel quantile regression tool for eQTLdiscovery." Bioinformatics, 2017+(http://biorxiv.org/content/early/2016/08/17/070052)R package available at https://github.com/cran/QRank

Gen Li (Columbia Biostatistics) QRank March 27, 2017 11 / 11