genotype-by-sequence · jesse’s gs presentation cf. “diversity and structure” presentation...

13
Genotype-by-Sequence Yung-Fen Huang Collaborative Oat Research Meeting, March 7, Ottawa

Upload: others

Post on 04-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genotype-by-Sequence · Jesse’s GS presentation cf. “Diversity and structure” presentation cf. Allele mining presentations Genetic structure of the sample Genetic structure

Genotype-by-Sequence

Yung-Fen Huang

Collaborative Oat Research Meeting, March 7, Ottawa

Page 2: Genotype-by-Sequence · Jesse’s GS presentation cf. “Diversity and structure” presentation cf. Allele mining presentations Genetic structure of the sample Genetic structure

Outline

2

• Principle of Genotype-by-Sequence (GBS)

• Oat GBS markers

• SNP assay vs. GBS

• Possible applications

• Ongoing oat GBS analysis and expected outcomes

Page 3: Genotype-by-Sequence · Jesse’s GS presentation cf. “Diversity and structure” presentation cf. Allele mining presentations Genetic structure of the sample Genetic structure

Genotype-by-Sequence

3

1. Complexity reduction: Digest DNA with restriction enzyme(s)

Use methylation-sensitive enzyme to filter out repetitive genomic regions

2. Ligate adapters

Sample 1 Sample 2 Sample 3

Genomic

DNA

sample-specific barcode

3. Pool and amplify samples

4. Sequencing Case of oat: 1.5 M reads/sample ~ 0.7% of genome (96-plex)

Page 4: Genotype-by-Sequence · Jesse’s GS presentation cf. “Diversity and structure” presentation cf. Allele mining presentations Genetic structure of the sample Genetic structure

Genotype-by-Sequence

4

5. SNP calling Use bioinformatic pipeline(s) to process the raw data for SNP identification

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

Ex. 1373 oat sample = 237 Gb (compressed file) = 1.5 billion reads

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

Marker S1 S2 S3 S4 S5 S6 S7 S8 S9

M1 A C C A A A A A A

M2 A H G A A A N A A

M3 A T T N T T N N T

M4 C A N A N N N N A

M5 A A C C C A A A A

M6 G C G C G G G C G

M7 N G N G G G G G N

Advantages – fast, large and cheap

- Marker discovery and genotyping at the same time

- Multiple samples at the same time

Challenges…. Bioinfo? And other?

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAAAGATATCGTGTCAGTGGARCTGATCATTACTTGGGGGAGGAAGGAGATAA

TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAAAAAAAAATTATATTATCAAAAGCTTTATCATTTTCTAYAGAGCCATAGCGATCATAT

TGCAGAAAAARAACACGGCAAAAAAATAAACACGACAATGGACAAACGAAGGCACACGGCAAAA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG

TGCAGAARAAAAAGGAACACAACAGAAAAGAGACATCTAACTAGATACGGGTCAACCGAGATCG

TGCAGAAAAAAAWTGCCATGAGAAGAATTCACTGCCAAGAACATCATCTCAGATCAGCACTATG TGCAGAAAAAAAAGAATAAAAGAAAACAGCYAGCCCACGTGGGCGGCAAATAGTTGCCGCCGCA

Page 5: Genotype-by-Sequence · Jesse’s GS presentation cf. “Diversity and structure” presentation cf. Allele mining presentations Genetic structure of the sample Genetic structure

Oat molecular data from CORE

5

KxO OxT TxM

DxE

Genotype-by-Sequence

(120K)

SxH OxP HxZ PxG

GoldenGate

(2K)

subset (108 lines)

Breeders’ selection (580 lines)

In progress

Infinium

(6K)

Bi-parental

mapping

populations

IOI (350 lines)

Page 6: Genotype-by-Sequence · Jesse’s GS presentation cf. “Diversity and structure” presentation cf. Allele mining presentations Genetic structure of the sample Genetic structure

How many markers does GBS produce?

6

Bi-parental (7 populations)

Europe (32 lines) Diversity panel (152 IOI lines)

~ 10,000 SNPs

More sequences

= more SNPs

North America (12 Breeding programs)

Cumulative marker numbers

Missing data (%)

10 20 30 40 50 60 70 80

0

20,000

40,000

60,000

80,000

100,000

120,000 N

o. o

f m

ark

ers

All (1373 lines)

OxT

~ 4,000 SNPs

Page 7: Genotype-by-Sequence · Jesse’s GS presentation cf. “Diversity and structure” presentation cf. Allele mining presentations Genetic structure of the sample Genetic structure

North America (12 Breeding programs)

All (1373 lines) Bi-parental (7 populations)

Europe (32 lines) Diversity panel (152 IOI lines)

Oat GBS markers

7

No. of markers at different levels of completeness

Completeness (%)

10 20 30 40 50 60 70 80 90

0

5,000

10,000

15,000

20,000

25,000

30,000

No

. o

f m

ark

ers

Most SNPs are

25-50% complete

OxT

More sequences

= more markers

at high completeness

More sequences are

expected with

technology update

Page 8: Genotype-by-Sequence · Jesse’s GS presentation cf. “Diversity and structure” presentation cf. Allele mining presentations Genetic structure of the sample Genetic structure

SNP assay vs. GBS

8

SNP assay GBS

SNP discovery Required Not required

No. of markers 6K-100K 2K-100K1

Time of experiment2 3 days 2-4 weeks

Time for SNP calling3 Weeks to months < 1 day

IT demand for SNP calling Simple High informatic effort

Data completeness (%) > 90% 0 < < 1004

Reproducibility High High

Cost/sample5 ~ $60 $10-20

For 96 samples (in the case of oat; based on former data with future projection):

1: Variable according to sample diversity and data completeness 2: From library preparation to raw data collection 3: Including data curation 4: Completeness varies according to end-use 5: Library and beadchip/sequencing consumables

Page 9: Genotype-by-Sequence · Jesse’s GS presentation cf. “Diversity and structure” presentation cf. Allele mining presentations Genetic structure of the sample Genetic structure

What can the GBS data tell us?

9

Breeding cycle

New

cultivars Major gene introgression

Genomic prediction

Genomic contribution: selection precision

(+cycle acceleration)

Genetic structure

of the sample

QTL mapping Association mapping

Structure and relatedness analysis

Genome organisation

Genetic map

High-throughput genomic data

Trait genetic

architecture

Missing data don’t matter

ex. Wheat, barley, cassava

Good

phenotypes

Page 10: Genotype-by-Sequence · Jesse’s GS presentation cf. “Diversity and structure” presentation cf. Allele mining presentations Genetic structure of the sample Genetic structure

Ongoing analysis – genetic map update

10

oc_plos_16A

[0] gmi_es15_c4222_543 [3] gmi_es17_c10073_640

[11] gmi_es17_c20215_324 [15] gmi_es_cc12708_442

[16] gmi_es17_c17558_304 [18] gmi_es15_c5368_259

[19] gmi_es01_c13907_104 [20] gmi_es01_c18017_440

[21] gmi_es17_c968_903 [22] gmi_es01_c7970_395 [23] gmi_es01_c1725_728

[24] gmi_es15_lrc19562_699 [25] gmi_es15_c19227_114

[26] gmi_es02_c3206_293 [30] gmi_es01_c13820_382

[32] gmi_snp2043_1 [36] gmi_es02_c1538_477

[37] gmi_es_cc2716_392 [39] gmi_snp_lrc40347_1 [42] gmi_es17_c8741_79

[44] gmi_es02_c15898_126

[52] gmi_es17_c3846_396 [54] gmi_es_cc13348_93

[56] gmi_es02_c21402_61 [57] gmi_ds_cc4575_55

[58] gmi_es15_c2802_625 [59] gmi_es02_c8034_282

[60] gmi_es_cc6497_157 [61] gmi_es17_c3200_273 [62] gmi_es17_c1612_641

[66] m38721-1 [67] af237553-1-2

[69] gmi_es15_c10509_256 [71] bm_912a

[74] gmi_es17_c5169_555 [75] gmi_es01_c17040_394

[76] gmi_es01_c1287_580 [77] gmi_es02_c12598_260 [81] gmi_es17_lrc7334_312

[82] gmi_es01_c284_1036 [86] bm_183a

gbs2_pg95_with_dist_16A

gmi_es01_c4259_207 [0] avjp1302 [0] gmi_es_cc9290_178 [3] tp252329 [12] avjp42734 [13] gmi_es17_c20215_324 [48] gmi_es17_c12516_818 [54] tp17466 [54] tp342240 [55] gmi_es15_c965_491 [56] gmi_es15_c735_156 [56] gmi_es15_c5905_473 [58] avjp70170 [59] avjp70171 [59] gmi_es02_c12745_731 [61] gmi_es17_c4427_657 [61] avjp20306 [63] avjp77463 [64] gmi_es14_c2025_443 [65] gmi_es17_c9257_328 [67] gmi_es03_c2344_498 [69] gmi_es17_c2699_441 [69] gmi_es01_c1725_728 [70] avjp76937 [72] avjp12767 [73] gmi_es01_c7970_395 [74]

avjp97487 [94] avjp53477 [96] gmi_es17_c9625_419 [98] gmi_es_cc14000_280 [99] gmi_es17_c5367_259 [100] avjp125669 [101] gmi_es02_c21402_61 [102] gmi_es15_c2802_625 [102] gmi_es17_c1612_641 [106] avjp68334 [106] avjp77316 [109] avjp105825 [110] avjp116909 [113] gmi_es15_c10509_256 [114] avjp115236 [116] gmi_es01_c17040_394 [119] avjp119774 [119] gmi_es17_c5169_555 [120] avjp85794 [121] avjp42711 [123] avjp12306 [124] avjp14139 [125] avjp52039 [126] tp279042 [127] gmi_es05_c8916_635 [129] avjp65787 [130] gmi_es17_c2063_243 [131] gmi_es15_c900_850 [149] avjp49411 [152] avjp90884 [161] gmi_es15_c17743_247 [170] tp336131 [177] gmi_es17_c7320_909 [184]

- 2nd generation framework map with high

quality SNP and GBS markers (1.5K +

40.6K) from 7 bi-parental populations

(quite challenging!)

Larger regions are covered

(ex. 16A: consensus vs. updated PxG map)

- Place historical markers and 19.6K

medium-quality GBS markers on

updated framework map

High-density consensus map of more than

50K ordered markers

Page 11: Genotype-by-Sequence · Jesse’s GS presentation cf. “Diversity and structure” presentation cf. Allele mining presentations Genetic structure of the sample Genetic structure

Genetic map

Expected outcome – upcoming analyses

11

Breeding cycle

New

cultivars Major gene introgression

Genomic prediction

QTL mapping Association mapping

High-throughput genomic data

Trait genetic

architecture

Genome organisation

Structure and relatedness analysis Structure and relatedness analysis

Association mapping QTL mapping

Expected outcome

Upcoming analysis

Trait genetic

architecture

Genomic prediction

cf. Jesse’s GS presentation

cf. “Diversity and structure” presentation

cf. Allele mining presentations

Genetic structure

of the sample

Genetic structure

of the sample

Page 12: Genotype-by-Sequence · Jesse’s GS presentation cf. “Diversity and structure” presentation cf. Allele mining presentations Genetic structure of the sample Genetic structure

12

Page 13: Genotype-by-Sequence · Jesse’s GS presentation cf. “Diversity and structure” presentation cf. Allele mining presentations Genetic structure of the sample Genetic structure

Genotype-by-Sequence

13

Main steps of SNP calling

A

A

G

G

G A

G

G

Sample 1 Sample 2 Sample 3

i. Group sequences (“reads”) of the same sample according to barcode

ii. Group identical reads (groups of reads = “tags”)

A S1 T S1

T S1 A S1

G S2

S3

G S2

G S2 S2

A S3

S2 G S3

G S3

ii. Identify SNPs (group tags with few base mismatches, ex. 1 base)

C

T

T

C C

C

C C

T S1

T S1

S3

S2

C

C

SNP 1 SNP 2

G S2

G S2

G S2

G S3

G S3

A S1

A S1