j. b. cole animal improvement programs laboratory agricultural research service, usda beltsville, md...

40
J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA [email protected] Use of NGS to identify the causal variant associated with a complex phenotype

Upload: alicia-page

Post on 28-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

J. B. Cole

Animal Improvement Programs LaboratoryAgricultural Research Service, USDABeltsville, MD 20705-2350, USA

[email protected]

Use of NGS to identify the causal variant associated with a complex phenotype

Page 2: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (2) Cole

Overview

Why are we sequencing?

How did we select the animals to sequence?

What are the steps involved in the process?

What do you do with the reads once you have them?

Where are we now?

Page 3: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (3) Cole

Introduction

Several studies (Kuhn et al., 2003; Cole et al., 2007; Seidenspinner et al., 2009) have reported QTL on BTA 18 associated with dystocia

Bioinformatic analysis using SNP data has not identified the causal variant

Next generation sequencing (NGS) has recently been used to find causal variants for novel recessive disorders

Page 4: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (4) Cole

Chromosome 18 is different

Markers on chromosome 18 have large effects on several traits: Dystocia and stillbirth: Sire and

daughter calving ease and sire stillbirth

Conformation: rump width, stature, strength, and body depth

Efficiency: longevity and net merit

Large calves contribute to reduced lifetimes and decreased profitability

Page 5: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (5) Cole

Marker effects for dystocia complex AR-BFGL-NGS-109285

Cole et al., 2009 (J. Dairy Sci. 92:2931–2946)

ARS-BFGL-NGS-109285

Cole et al., 2009 (J. Dairy Sci. 92:2931–2946)

Page 6: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (6) Cole

Correlations in dystocia complex

Page 7: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (7) Cole

The QTL also affects gestation length

Maltecca et al. 2011. Animal Genetics, 42:6, 585-591.

Page 8: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (8) Cole

Overview of the dystocia complex

The key marker is ARS-BFGL-NGS-109285 at (rs109478645 ) 57,585,121 Mb on BTA18

Intronic to SIGLEC12 (sialic acid binding Ig-like lectin 12)

Recent results indicate effects on gestation length (Maltecca et al., 2011) and calf birth weight (Cole et al., unpublished data)

Page 9: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (9) Cole

This is a gene-rich region

http://useast.ensembl.org/Bos_taurus/Location/View?r=18%3A57583000-57587000

http://www.ncbi.nlm.nih.gov/gene?cmd=Retrieve&dopt=Graphics&list_uids=618463

Page 10: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (10) Cole

Copy number variants are present

ARS-BFGL-NGS-109285 is flanked by CNV

There’s a loss and a gain to the left (8 SNP region)

There’s a gain to the right (10 SNP region)

This can result in assembly problems

Hou et al. 2011. Genomic characteristics of cattle copy number variations. BMC Genomics. 12:127.http://www.biomedcentral.com/1471-2164/12/127

Page 11: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (11) Cole

Where did this problem come from?

http://aipl.arsusda.gov/CF-queries/Bull_Chromosomal_EBV/bull_chromosomal_ebv.cfm?

40,803 daughters

Page 12: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (12) Cole

What if we look at a different trait?

Cole et al. (2007) proposed the following mechanism:

SIGLEC12 may sequester circulating leptin

This increases gestation length

Calf birth weight (BW) is higher because of increased gestation length

Higher BW is associated with dystocia

Page 13: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (13) Cole

We don’t have birth weight data

Birth weights are not routinely recorded in the US

Collaborated with Hermann Swalve’s group to develop a selection index prediction of BW PTA

Performed GWAS and gene set enrichment analysis to search for interesting associations

Page 14: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (14) Cole

GWAS for birth weight PTA

h

Cole et al.(2013), unpublished data

Page 15: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (15) Cole

Are we measuring anything new?

Identified a SNP intronic to LHX4, which is associated with cow body weight and length (Ren et al., 2010, Mol. Bio. Reprod., 37:417-422).

4 SNP in the QTL region on BTA 18 had large effects

Several other SNP with large effects intronic or adjacent to genes with unknown functions

Page 16: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (16) Cole

KEGG pathways for birth weightWhat does regulation of the actin cytoskeleton have to do with birth weight in cattle?

That is, do these results make sense?

Maybe…these pathways may be involved in establishment & maintenance of pregnancy, as well as coordination of growth and development.

Cole et al.(2013), unpublished data

Page 17: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (17) Cole

Sequencing is becoming very affordable

Page 18: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (18) Cole

Sequencing successes at AIPL/BFGL

Simple loss-of-function mutations

APAF1 – Spontaneous abortions in Holstein cattle (Adams et al., 2012)

CWC15 – Early embryonic death in Jersey cattle (Sonstegard et al., 2013)

Weaver syndrome – Neurological degeneration and death in Brown Swiss cattle (McClure et al., 2013)

Page 19: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (19) Cole

Original pedigree-based design

Bull A (1968)AA, SCE: 8

Bull B (1962)AA, SCE: 7

MGS

Bull H (1989)Aa, SCE: 14

Bull I (1994)Aa, SCE: 18

Bull E (1982)Aa, SCE: 8

Bull F (1987)Aa, SCE: 15

Bull C (1975)AA, SCE: 8δ = 10Bull D (1968)

??, SCE: 7

MGS

Bull E (1974)Aa, SCE: 10

MGS

Page 20: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (20) Cole

Modified pedigree & haplotype design

Bull A (1968)AA, SCE: 8

Bull B (1962)AA, SCE: 7

MGS

Bull H (1989)Aa, SCE: 14

Bull I (1994)Aa, SCE: 18

Bull E (1982)Aa, SCE: 8

Bull F (1987)Aa, SCE: 15

Bull C (1975)AA, SCE: 8δ = 10 Bull E (1974)

Aa, SCE: 10

MGS

Bull J (2002)Aa, SCE: 6

Bull K (2002)Aa, SCE: 15

Bull J (2002)aa, SCE: 15

These bulls carrythe haplotype withthe largest, negativeeffect on SCE:

Bull D (1968)??, SCE: 7

Couldn’t obtain DNA:

Page 21: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (21) ColeDNA Quality Control

Molecular prep

Sample Collection

DNA Extraction

Library Construction

Library Quality Control

Page 22: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (22) Cole

Sample preparation time is substantial

DNA Extraction: ~12 hours (30 mins)

DNA QC: ~1-2 hours (1-2 hours)

Library Construction: 48 hours (12 hours)

Library QC: ~2-4 hours (1 hour)

Total: 3-4 days (15.5 hours)

Page 23: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (23) Cole

DNA quality

Page 24: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (24) Cole

Library quality

Page 25: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (25) Cole

Sequencing stage

• Illumina cBot:• Preps DNA for sequencing• Takes 4-5 hours• Must be done 48 hours before

• Illumina HiSeq 2000:• Does the sequencing• Takes ~10-14 days for 100 x 100• Minimal hands-on time

Page 26: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (26) Cole

Anatomy of a flow cell

8 lanes per flow cell

3 columns per lane

− 96 tiles per column

Each tile imaged 8 times

1 from upper surface, 1 from lower

Approximately 300Gb of sequence per flow cell

http://www.qbi.uq.edu.au/images/genomics/genomics1.jpg

Page 27: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (27) Cole

Sequencing by synthesis

https://www.broadinstitute.org/files/shared/illuminavids/sequencingSlides.pdf

Page 28: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (28) Cole

How many scientists does it take…

Page 29: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (29) Cole

Flowcell 1: Cluster densitiesCluster densities from current HiSeq run finished 30 April 2013 (unpublished data):

Page 30: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (30) Cole

Flowcell 2: Cluster densitiesCluster densities from current HiSeq run started 22 May 2013 (unpublished data):

Page 31: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (31) Cole

The Aftermath Total Time (sample to sequence):

3 weeks

That’s assuming nothing went wrong!

More realistic: months

Resulting Data

Large text files

~300 gigabytes compressed

Analysis

Often underestimated

Can take months as well

Page 32: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (32) Cole

Variant detection

• Alignment against a reference genome

• Analysis is very disk I/O-intensive.

Variant DetectionRaw Sequencer Output

Alignment to the Genome

Page 33: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (33) Cole

Computational Logistics Desktop computers

Viable for single lanes

Long computation time

Servers are better

>100GB RAM and >16 processorcores

Cloud

Amazon Web Services

iAnimal/iPlant

Page 34: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (34) Cole

Storage considerations

What to save?

Raw data?

Processed results?

How much workspace?

Suggestions:

Workspace 10x compressed files

Save alignments

Backup REGULARLY!!!

Page 35: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (35) Cole

Why should you use a pipeline?

• Automates analysis• Maximizes resource consumption• Because post-docs aren’t cheap

Page 36: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (36) Cole

Galaxy server

NextGene

Custom pipeline

Scripting languages

Open-source tools

Many options for analysis pipelines

Page 37: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (37) Cole

Challenges

Annotation

This is a mess in the cow

The reference assembly may not be representative of all taurine cows

Validation

Doing functional genomics with large mammals is expensive – who pays?

When have we proven something?

Page 38: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (38) Cole

Conclusions

Sequencing is powerful, but presents many challenges

Computational requirements are substantial

We’re learning how much we don’t know about functional genomics in the cow

Validation remains a problem

Page 39: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (39) Cole

Acknowledgments

AIPL: Derek Bickhart, Dan Null, Paul VanRaden

BFGL: Reuben Anderson, Steve Schroeder, Tad Sonstegard, Curt Van Tassell

Page 40: J. B. Cole Animal Improvement Programs Laboratory Agricultural Research Service, USDA Beltsville, MD 20705-2350, USA john.cole@ars.usda.gov Use of NGS

Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (40) Cole

Questions?

http://gigaom.com/2012/05/31/t-mobile-pits-its-math-against-verizons-the-loser-common-sense/shutterstock_76826245/