post doctoral associate cornell university

26
Fei Lu Postdoctoral Associate Cornell University http://www.maizegenetics.net

Upload: others

Post on 05-Dec-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Post doctoral Associate Cornell University

Fei LuPost‐doctoral Associate

Cornell Universityhttp://www.maizegenetics.net

Page 2: Post doctoral Associate Cornell University

Genotyping by sequencing (GBS) is simple and cost effective

..... .

.......... .......

................... ..

..... .

... ....

. .....

.

........ ..

.....

.... .................

. ......... ..

.......

.. .........

.

........... .....

1. Digest DNA 2. Ligate adapters

with barcodes3. Pool DNAs 4. PCR

5. Illuminasequencing

(Elshire et al. 2011. PLoSone)

(Altshuler et al. 2000. Nature)

Reduced representation library approach500,000 reads/sample 

(384 plex)

Page 3: Post doctoral Associate Cornell University

Universal Network Enabled Analysis Kit (UNEAK) 

A reference free SNP calling pipeline

Designed for species that…. lack a reference genome are diploid or polyploid are inbreeders or outcrossers have limited genetic or genomic resources

Page 4: Post doctoral Associate Cornell University

Overview of UNEAK

Genome is digested, sequenced using GBS

Reads are trimmed to 64 bp

Identical reads = tag

A

B

Page 5: Post doctoral Associate Cornell University

Overview of UNEAK – Network filter

Pairwise alignment  to findtag pairs with  1 bp mismatch

Topology of tag networks

Keep common reciprocal tags

C

E

D

errorreal tags

F

Build tag networks

count

Page 6: Post doctoral Associate Cornell University

Topology of tag networksTagError

Plastid &Highly repetitivetags

Moderatelyrepetitive tags,Paralogs &SNPs

Networks of 2496 tags

Page 7: Post doctoral Associate Cornell University

Details about network filter

SNPError tolerance

Page 8: Post doctoral Associate Cornell University

Program flowchart of UNEAKFastq/Qseq

TagCount

Networkfilter

TagPair TBT(Byte/Bit)

MapInfo

Optionalfilters

HapMap TagPair (Long, Long, Integer)Seq,   Seq,   Order

MapInfo includes:•SNP•Seq•Count•Count distribution•Heterozygote code

Page 9: Post doctoral Associate Cornell University

Pipeline validated with maize inbred linkage population –network filter

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

00.08 0.16

0.24

0.32 0.4

0.48

0.56

0.64

0.72 0.8

0.88

0.96Pr

opor

tion

of S

NPs

Allele frequency

Single‐site rate(Blast to maize)

Allele frequencydistribution

00.0050.010.0150.020.0250.030.0350.040.0450.05

00.07 0.14

0.21

0.28

0.35

0.42

0.49

0.56

0.63 0.7

0.77

0.84 0.91

0.98Pr

opor

tion

of S

NPs

Allele frequency

23.3% 85.0%

Step 1 Pairwise alignment of tags

Step 2Network filterEvaluation

criteria

Page 10: Post doctoral Associate Cornell University

Pipeline validated with maize inbred linkage population –SNP validation LD distribution (SNPs against 1106 markers)

Alignment (B97 tags against B97 shotgun genome from Maize HapMap2 data) 93.2% SNPs are polymorphic

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

00.04

0.08 0.12

0.16 0.2

0.24

0.28

0.32

0.36 0.4

0.44

0.48

0.52

0.56 0.6

0.64

0.68

0.72

0.76 0.8

0.84

0.88

0.92

0.96

Pro

portion of SNPs

LD (r2)

0.2

92.2%

Page 11: Post doctoral Associate Cornell University

Characterization of the Genetic Diversity of Switchgrass Using Genotyping by Sequencing 

(GBS)

Page 12: Post doctoral Associate Cornell University

GWAS and GS require high‐density markers to accelerate breeding

SNP discovery

Genome Wide Association Study (GWAS) Genomic Selection (GS) 

Accelerate switchgrass breeding

Page 13: Post doctoral Associate Cornell University

Challenges and goals Challenges No reference genome Multiple ploidy levels (4X, 6X and 8X) Highly heterozygous

Goals Discover high‐density  SNPs Construct linkage disequilibrium (LD) map  Evaluate population structure  Reconstruct phylogeny 

Page 14: Post doctoral Associate Cornell University

Switchgrass data setLinkage Populations  Association Populations 

• Full‐sib Populationn=130 individuals

• Half‐sib Populationn=168 individuals

66 diverse populations• Mostly northern‐adapted,Upland populations and cultivars

n= 540 individuals

350 GB sequence                    1.2 million SNPs generated!

Page 15: Post doctoral Associate Cornell University

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

Prop

ortio

n of

SN

Ps

Allele frequency

Allele frequency in full-sib population

Tetraploid switchgrass behaves like a diploid

1:3 1:1 3:1AA×Aa AA×aa

Aa×Aaaa×Aa

F1Most informative markers to construct linkage map

50,000 SNPs

Page 16: Post doctoral Associate Cornell University

18 Linkage groups perfectly match the chromosome number of switchgrass

Correlation of linkage groups

Can we order the SNPs?

Yes, use synteny

R

3,000 high coverage SNPs

Page 17: Post doctoral Associate Cornell University

Linkage groups perfect match to syntenicchromosomes of Foxtail millet (Setaria italica)

Small (490 Mb) genome, diploid,  n=9  13 million years divergent from switchgrass 10% switchgrass SNPs map to foxtail millet genome Constructed two framework linkage maps of 18 groups (3,224 and 4,001 

markers) 42K paternal map and 47K maternal map

Linkage grou

ps of switchgrass

Chromosomes of foxtail millet

Page 18: Post doctoral Associate Cornell University

Upland and lowland ecotypes clearly separate in phylogeny

Upland

Lowland

Detail

Jackson, MI

Hansens Island, MI

Tipton, IN

Fillmore, MN

Genesee, MN

Ipswich prairie, WI

Ipswich prairie, WI

WS4U

Page 19: Post doctoral Associate Cornell University

Ploidy level resolves into distinct groups

Upland 8X

Upland 4X

Upland 8X

Lowland 4XLowland 4X

Upland 8X

Ploidy level identified by flow cytometry (Costich et al. 2010. Plant Genome)

Page 20: Post doctoral Associate Cornell University

Geography shows isolation by distance

Upland 4X North

Upland 8XEast

Lowland 4XNortheast

Upland 8XSouth

Upland 8XWest

Lowland 4XSouth

Page 21: Post doctoral Associate Cornell University

Upland 4X arose from Upland 8X

NJ tree using 3,000 markers

Foxtail millet(outgroup)

Upland

Lowland96

58

100

100

10087 16

15

66

61

Upland 8X West

Upland 4X North

Upland 8X East

Lowland 4X Southeast

Lowland 4X South

Upland 8X South

ba

NJ tree using 29,221 markers

Page 22: Post doctoral Associate Cornell University

Reduced diversity in Upland 4X compared with Upland 8X

Upland 8X West

Upland 4X North

Upland 8X East

Upland 8X South

Coo

rdin

ate

2

Coordinate 1

0.0

0.2

0.4

-0.2

-0.4 0.0 0.2-0.2 0.4

MDS plot

Page 23: Post doctoral Associate Cornell University

Migration paths of switchgrass

Upland 4X North

Upland 8XEast

Lowland 4XNortheast

Upland 8XSouth

Upland 8XWest

Lowland 4XSouth

Page 24: Post doctoral Associate Cornell University

Summary Effective SNP calling pipeline is developed It works well for non‐reference, heterozygous, and polyploid species

1.2 million high density SNPs discovered for GWAS Tetraploid switchgrass behaves like a diploid Synteny based SNP maps constructed Robust phylogeny concurs well with ecotype, ploidylevel and geographic distribution of switchgrass

Data suggests that Upland 4X arose from Upland 8X

Page 25: Post doctoral Associate Cornell University

Future DirectionPutting it all together: GWAS and GS

• Flowering time• Plant height• Leaf length and width• Standability• Biomass quality traits

Linkage populations Association populations

Caldwell Field, Cornell U, Ithaca, NY

Page 26: Post doctoral Associate Cornell University

AcknowledgementsProject Manager:Denise Costich (USDA‐ARS, Cornell )

PIs:Edward Buckler (USDA‐ARS, Cornell)Michael Casler (USDA‐ARS, UW‐Madison)Jerome Cherney (Cornell)

Bioinformatics:Dallas Kroon

Supported by DOE (including JGI), USDA, and NSF

Institute for Genomic Diversity  (Cornell)

Sequencing:Rob ElshireJeff GlaubitzWenyan ZhuMoira Sheehan

Statistics:Alex Lipka

Field:Ken PaddockNick LepakNick Kaczmar