in silico blood genotyping from exome sequencing data silico blood genotyping from exome sequencing...

Post on 09-Mar-2018

223 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

In silico blood genotyping from exomesequencing data

Silvio Tosatto

BioComputing UP, Department of Biology,University of Padova, Italy

URL: http://protein.bio.unipd.it/

Today

• Personalized genetics has been upon us for some time

• How good are we at actually identifying phenotype from whole genome?

The CAGI Personal Genome Project (PGP) Challenge

• Few goals are more pure to genome interpretation than predicting traitsfrom raw sequence (or genotype) data

• In this CAGI challenge, phenotypes/traits are predicted for real people with genetic data

• 10 individual’s genetic information from the Personal Genome Project are provided (PGP-10)

Dataset provided byGeorge Church

Personal genome project (PGP) ‐ Predict individuals’ phenotype

Numerical traits33. Birth weight (in g)34. HDL level (in mg/dL) *35. LDL level (in mg/dL) *36. Triglyceride level

(in mg/dL) *37. Fasting blood glucose level

(in mg/dL)38. Warfarin dose (in mg)39. Age at Menarche40. Annual income (in $)

Numerical traits33. Birth weight (in g)34. HDL level (in mg/dL) *35. LDL level (in mg/dL) *36. Triglyceride level

(in mg/dL) *37. Fasting blood glucose level

(in mg/dL)38. Warfarin dose (in mg)39. Age at Menarche40. Annual income (in $)

Personal genome project (PGP) ‐ Predict individuals’ phenotype

Blood Groups

• Clear genetic cause of phenotypes

• Model system for phenotype prediction

• Good description in literature

• High relevance, especially for blood transfusions

(Blood. 2009;114: 248-256)

Example: ABO glycosyltransferase

Blood Grp Genes AntigensABO ABO A, B, O

Amino acid residues differingbetween blood group A- and B-active transferases, respectively (Arg176Gly; Gly235Ser; Leu266Met; Gly268Ala) are shown with the single-letter code and theirpositions indicated.

Relevant Blood Types

Blood Grp Genes AntigensABO ABO A, B, O

RH RHCE, RHD D, E, C plus 50 minor

DUFFY DARC FY(a), FY(b)

Kell KEL K1, K2 plus 23 minor

Diego SLC4A1 Dia, Dib, Wra, Wrb

Kidd SLC14A1 Jk(a), Jk(b)

Lewis FUT3 a, b

Lutheran BCAM Lu(a), Lu(b) plus 15 minor

MNS GYPA, GYPB, GYBE

M, N, S plus 40 minor

Bombay FUT1, FUT2 H, secretor

10 out of ca. 30 blood groups are relevantfor transfusions

BOOGIE: BlOOd Group IdEntifier

• A knowledge-based system to predict blood groups from sequencing data

• All 10 groups relevant for blood transfusions are predicted

• A specialized genotype-phenotype knowledge base is required

BOOGIE: Knowledge representation

• Stored in tree-like structure

• Rules expressed in “if <mutation(s)>

then <phenotype(s)>” form

BOOGIE: Knowledge collection

– Manually curated

– 580 rules derived

Blood G rp G enes AntigensABO ABO A, B , O

R H R H C E, R H D D , E, C p lus 50 m inor

D U FFY D AR C FY(a), FY(b)

Kell KEL K1, K2 p lus 23 m inor

D iego SLC 4A1 D ia, D ib, W ra, W rb

K idd SLC 14A1 Jk(a), Jk(b)

Lew is FU T3 a, b

Lutheran BC AM Lu(a), Lu(b) p lus 15 m inor

M N S G YPA, G YPB, G YBE

M , N , S p lus 40 m inor

Bom bay FU T1, FU T2 H , secre tor

Relevant variants

Gene‐based annotation of variants

Select conserved positions

Remove unrelatedgenes

ANNOVARANNOVAR(Wang et al., Nucleic Acids Research 2010)

Millions of SNVs

ANNOVAR is used

to reduce the SNVs

to manageable

number.

Few relevant SNVs

BOOGIE Pipeline

B lood G rp G enes AntigensABO ABO A, B , O

R H R H C E, R H D D , E , C p lus 50 m inor

D U FFY D AR C FY(a), FY(b)

Kell KEL K1, K2 p lus 23 m inor

D iego SLC 4A1 D ia, D ib, W ra, W rb

K idd SLC 14A1 Jk(a), Jk(b)

Lew is FU T3 a, b

Lutheran BC AM Lu(a), Lu(b) p lus 15 m inor

M N S G YPA, GYPB, G YBE

M , N , S p lus 40 m inor

Bom bay FU T1, FU T2 H , secre tor

Benchmarking

• BOOGIE covers all known blood group variants

• Difficulty in finding genome sequences with known blood phenotypes

• Personal Genome Project (PGP) as annotated benchmark set

Personal Genome Project (PGP)

The mission of the PGP is to encourage the development of personal genomics

• 10 individual’s genetic information from the Personal Genome Project are provided (PGP-10)

• A larger dataset (PGP-1K) aims to cover at least1,000 genomes

Unfortunately, only ABO and Rh blood groupinformation is available

PGP-10 Data

Back row (left to right): James Sherley, Misha Angrist, John Halamka, Keith Batchelder, Rosalynn Gill.

Front row (left to right): Esther Dyson, George Church, Kirk Maxey.

Not shown: Stan Lapidus and Steven Pinker.

PGP-10 Data

PGP-10 Results

PGP1 PGP4 PGP8Known O + A - B +ABO O A BRh c; e; weak D c; e; weak D c; e; weak D

DUFFY FY(a+); FY(b-) FY(a-); FY(b+) FY(a-); FY(b+)KELL K2; K21+; K4-;

K3-; K11; K17; K14; K24; K6+;

K7-

K2; K21+; K4-; K3-; K11; K17; K14; K24; K6+;

K7-

K2; K21+; K4-; K3-; K11; K17; K14; K24; K6+;

K7-Diego Dib; Memph neg Dib; Memph neg Dib; Memph negKIDD Jk(a-); Jk(b+) Jk(a-); Jk(b+) Jk(a+); Jk(b-)Lewis negative negative negative

Lutheran Lu(a-); Lu(b+);Lu6+; Lu9-; Lu4; Lu8+; Aua+;Aub-

Lu(a-); Lu(b+);Lu6-; Lu9+;Lu4-; Lu8+; Aua-;Aub+

Lu(a-); Lu(b+);Lu6+; Lu9-;Lu4-; Lu8+; Aua+;Aub-

MNS M; S M; s M,sBombay H+; secretor H+; secretor H+; secretor

BOOGIE predicts correctly all ABO types and allexcept one (PGP-4) Rh groups

PGP-1K Results

• A second dataset was built from all PGP-1K participants with availableblood group information for a total of 22 individuals

• This dataset contains micro array data (23&me SNPs)

P = predicted R = real* = missing blood group relevant SNPs from dataset

Conclusions

• We developed a method, called BOOGIE, to predict the ten blood

groups relevant for transfusions from sequencing data

– Specialized knowledgebase with 580 genotype to phenotype rules

– Novel variants can be easily considered

• Benchmarking was (so far) only possible on PGP data for the ABO and

Rh blood groups

– The ABO and Rh systems are correctly predicted in 85-100% of cases

– The Rh- type presents some additional difficulties

AcknowledgementsAcknowledgements

Manuel Giollo

Giovanni Minervini

Marta Scalzotto (not shown)

Emanuela Leonardi

Carlo Ferrari

URL:URL: http://http://protein.bio.unipd.itprotein.bio.unipd.it//

FundingFIRB Futuro in Ricerca

Università di Padova CARIPLOAIRC

top related