post doctoral associate cornell university
TRANSCRIPT
Fei LuPost‐doctoral Associate
Cornell Universityhttp://www.maizegenetics.net
Genotyping by sequencing (GBS) is simple and cost effective
..... .
.......... .......
................... ..
..... .
... ....
. .....
.
........ ..
.....
.... .................
. ......... ..
.......
.. .........
.
........... .....
1. Digest DNA 2. Ligate adapters
with barcodes3. Pool DNAs 4. PCR
5. Illuminasequencing
(Elshire et al. 2011. PLoSone)
(Altshuler et al. 2000. Nature)
Reduced representation library approach500,000 reads/sample
(384 plex)
Universal Network Enabled Analysis Kit (UNEAK)
A reference free SNP calling pipeline
Designed for species that…. lack a reference genome are diploid or polyploid are inbreeders or outcrossers have limited genetic or genomic resources
Overview of UNEAK
Genome is digested, sequenced using GBS
Reads are trimmed to 64 bp
Identical reads = tag
A
B
Overview of UNEAK – Network filter
Pairwise alignment to findtag pairs with 1 bp mismatch
Topology of tag networks
Keep common reciprocal tags
C
E
D
errorreal tags
F
Build tag networks
count
Topology of tag networksTagError
Plastid &Highly repetitivetags
Moderatelyrepetitive tags,Paralogs &SNPs
Networks of 2496 tags
Details about network filter
SNPError tolerance
Program flowchart of UNEAKFastq/Qseq
TagCount
Networkfilter
TagPair TBT(Byte/Bit)
MapInfo
Optionalfilters
HapMap TagPair (Long, Long, Integer)Seq, Seq, Order
MapInfo includes:•SNP•Seq•Count•Count distribution•Heterozygote code
Pipeline validated with maize inbred linkage population –network filter
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
00.08 0.16
0.24
0.32 0.4
0.48
0.56
0.64
0.72 0.8
0.88
0.96Pr
opor
tion
of S
NPs
Allele frequency
Single‐site rate(Blast to maize)
Allele frequencydistribution
00.0050.010.0150.020.0250.030.0350.040.0450.05
00.07 0.14
0.21
0.28
0.35
0.42
0.49
0.56
0.63 0.7
0.77
0.84 0.91
0.98Pr
opor
tion
of S
NPs
Allele frequency
23.3% 85.0%
Step 1 Pairwise alignment of tags
Step 2Network filterEvaluation
criteria
Pipeline validated with maize inbred linkage population –SNP validation LD distribution (SNPs against 1106 markers)
Alignment (B97 tags against B97 shotgun genome from Maize HapMap2 data) 93.2% SNPs are polymorphic
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
00.04
0.08 0.12
0.16 0.2
0.24
0.28
0.32
0.36 0.4
0.44
0.48
0.52
0.56 0.6
0.64
0.68
0.72
0.76 0.8
0.84
0.88
0.92
0.96
Pro
portion of SNPs
LD (r2)
0.2
92.2%
Characterization of the Genetic Diversity of Switchgrass Using Genotyping by Sequencing
(GBS)
GWAS and GS require high‐density markers to accelerate breeding
SNP discovery
Genome Wide Association Study (GWAS) Genomic Selection (GS)
Accelerate switchgrass breeding
Challenges and goals Challenges No reference genome Multiple ploidy levels (4X, 6X and 8X) Highly heterozygous
Goals Discover high‐density SNPs Construct linkage disequilibrium (LD) map Evaluate population structure Reconstruct phylogeny
Switchgrass data setLinkage Populations Association Populations
• Full‐sib Populationn=130 individuals
• Half‐sib Populationn=168 individuals
66 diverse populations• Mostly northern‐adapted,Upland populations and cultivars
n= 540 individuals
350 GB sequence 1.2 million SNPs generated!
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
Prop
ortio
n of
SN
Ps
Allele frequency
Allele frequency in full-sib population
Tetraploid switchgrass behaves like a diploid
1:3 1:1 3:1AA×Aa AA×aa
Aa×Aaaa×Aa
F1Most informative markers to construct linkage map
50,000 SNPs
18 Linkage groups perfectly match the chromosome number of switchgrass
Correlation of linkage groups
Can we order the SNPs?
Yes, use synteny
R
3,000 high coverage SNPs
Linkage groups perfect match to syntenicchromosomes of Foxtail millet (Setaria italica)
Small (490 Mb) genome, diploid, n=9 13 million years divergent from switchgrass 10% switchgrass SNPs map to foxtail millet genome Constructed two framework linkage maps of 18 groups (3,224 and 4,001
markers) 42K paternal map and 47K maternal map
Linkage grou
ps of switchgrass
Chromosomes of foxtail millet
Upland and lowland ecotypes clearly separate in phylogeny
Upland
Lowland
Detail
Jackson, MI
Hansens Island, MI
Tipton, IN
Fillmore, MN
Genesee, MN
Ipswich prairie, WI
Ipswich prairie, WI
WS4U
Ploidy level resolves into distinct groups
Upland 8X
Upland 4X
Upland 8X
Lowland 4XLowland 4X
Upland 8X
Ploidy level identified by flow cytometry (Costich et al. 2010. Plant Genome)
Geography shows isolation by distance
Upland 4X North
Upland 8XEast
Lowland 4XNortheast
Upland 8XSouth
Upland 8XWest
Lowland 4XSouth
Upland 4X arose from Upland 8X
NJ tree using 3,000 markers
Foxtail millet(outgroup)
Upland
Lowland96
58
100
100
10087 16
15
66
61
Upland 8X West
Upland 4X North
Upland 8X East
Lowland 4X Southeast
Lowland 4X South
Upland 8X South
ba
NJ tree using 29,221 markers
Reduced diversity in Upland 4X compared with Upland 8X
Upland 8X West
Upland 4X North
Upland 8X East
Upland 8X South
Coo
rdin
ate
2
Coordinate 1
0.0
0.2
0.4
-0.2
-0.4 0.0 0.2-0.2 0.4
MDS plot
Migration paths of switchgrass
Upland 4X North
Upland 8XEast
Lowland 4XNortheast
Upland 8XSouth
Upland 8XWest
Lowland 4XSouth
Summary Effective SNP calling pipeline is developed It works well for non‐reference, heterozygous, and polyploid species
1.2 million high density SNPs discovered for GWAS Tetraploid switchgrass behaves like a diploid Synteny based SNP maps constructed Robust phylogeny concurs well with ecotype, ploidylevel and geographic distribution of switchgrass
Data suggests that Upland 4X arose from Upland 8X
Future DirectionPutting it all together: GWAS and GS
• Flowering time• Plant height• Leaf length and width• Standability• Biomass quality traits
Linkage populations Association populations
Caldwell Field, Cornell U, Ithaca, NY
AcknowledgementsProject Manager:Denise Costich (USDA‐ARS, Cornell )
PIs:Edward Buckler (USDA‐ARS, Cornell)Michael Casler (USDA‐ARS, UW‐Madison)Jerome Cherney (Cornell)
Bioinformatics:Dallas Kroon
Supported by DOE (including JGI), USDA, and NSF
Institute for Genomic Diversity (Cornell)
Sequencing:Rob ElshireJeff GlaubitzWenyan ZhuMoira Sheehan
Statistics:Alex Lipka
Field:Ken PaddockNick LepakNick Kaczmar