course list- dtu health tech - alignment post-processing and ......44 g t t quality score: 10...

97

Upload: others

Post on 16-Apr-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

1

Page 2: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT
Page 3: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

A brief reminder

Page 4: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

A brief reminder

sister chromatids

homologous chromosomes

Paternal Maternal

Page 5: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

M:

P:

HeterozygosityTACAAATATTACAGATAT

7

Heterozygous sites

Page 6: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

M:

P:

Heterozygosity

segregating sites

8

Page 7: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

M:

P:

HeterozygosityTACAAATATTACAGATAT

9

Homozygous sites

Page 8: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

M:P:

TACAAATATTACAGATAT

Heterozygosity

ind#A

M:P:

TACAGATCTTACAGATCT

ind#B

12

Page 9: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

M:P:

TACAAATATTACAGATAT

Heterozygosity

M:P:

TACAGATCTTACAGATCT

ind#A

ind#B

13

Page 10: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

M:P:

TACAAATATTACAGATAT

M:P:

TACAGATCTTACAGATCT

Homozygous variantind#A

ind#B

14

Page 11: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

M:P:

TACAAATATTACAGATAT

M:P:

TACAGATCTTACAGATCT

Homozygous invariantind#A

ind#B

15

Page 12: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

AAACAGATCCCGCTGGGTTT

reference

ind#X TACAAATATTACAGATAT

Genotyping

16

Which of the 10 possible genotypes is the most likely?

Page 13: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

reference

ind#X

reference

ind#Y

TACAAATATTACAGATAT

TACAGATCTTACAGATCT

Joint Genotyping

17

Page 14: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Menu

• Introduction

• Alignment post-processing

• From aligned reads to genomic variation

• Variant effect

18

Page 15: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Generalized NGS analysis

19

Page 16: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Brief probability reminder

20

Events:

= I pick a random human and that person is Danish

= pop. of Denmark / pop. world

?

Page 17: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Brief probability reminder

21

Events:

= I pick a random human and that person is Danish

= I pick a random human and that person’s name is Rasmus

= pop. of Denmark / pop. world

= # of Rasmuses / pop. world

= # of Rasmuses in Denmark / # of Rasmuses in the world

?

My name is Rasmus!

Page 18: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

What is genotyping?

Genotyping is determining which genotype maximizes:

22

genotype data

Page 19: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

What is genotyping?

23

genotype data

Genotyping is determining which genotype maximizes:

Page 20: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

What is genotyping?

24

genotype data

TA

Genotyping is determining which genotype maximizes:

Page 21: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling 25

What is genotyping?

Page 22: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling 26

What is genotyping? likelihood: What is the probability of seeing the data given the genotype?prior: what is the probability of

the genotype to begin with?

Page 23: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling 27

What is genotyping? likelihood: What is the probability of seeing the data given the genotype?prior: what is the probability of

the genotype to begin with?

evidence: What is the probability of generating that data to begin with?

Page 24: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

29

i.e. each reads is an independent observation

Page 25: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

30

G

T

T

Toy example:

Page 26: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

31

G

T

T

Toy example:

quality score: 10

quality score: 20

quality score: 20

Page 27: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

32

G

T

T

Toy example:

quality score: 10 P(error) = 0.1

quality score: 20 P(error) = 0.01

quality score: 20 P(error) = 0.01

Page 28: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

33

G

T

T

Toy example:

quality score: 10 P(error) = 0.1

quality score: 20 P(error) = 0.01

quality score: 20 P(error) = 0.01

The 2 Ts are sequencing errors!The genotype is GG

Nope! They are all correct and the genotype is GT

Liar! The G is a sequencing error! TT is the genotype

Page 29: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

34

G

T

T

Toy example:

quality score: 10 P(error) = 0.1

quality score: 20 P(error) = 0.01

quality score: 20 P(error) = 0.01

Error model

P(G|A)=

P(G|C)=

P(G|G)=

P(G|T)=

What I think is the baseprobability of the data given the base

Page 30: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

36

G

T

T

Toy example:

quality score: 10 P(error) = 0.1

quality score: 20 P(error) = 0.01

quality score: 20 P(error) = 0.01

Let’s evaluate 3 possible genotypes:

GG

GT

TT

Page 31: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

37

G

T

T

quality score: 10 P(error) = 0.1

quality score: 20 P(error) = 0.01

quality score: 20 P(error) = 0.01

GG

Page 32: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

38

G

T

T

quality score: 10 P(error) = 0.1

quality score: 20 P(error) = 0.01

quality score: 20 P(error) = 0.01

GG

G G

Page 33: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

39

G

T

T

quality score: 10 P(error) = 0.1

quality score: 20 P(error) = 0.01

quality score: 20 P(error) = 0.01

GG

G G

Page 34: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

40

G

T

T

quality score: 10 P(error) = 0.1

quality score: 20 P(error) = 0.01

quality score: 20 P(error) = 0.01

GG

G G

Page 35: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

41

G

T

T

quality score: 10 P(error) = 0.1

quality score: 20 P(error) = 0.01

quality score: 20 P(error) = 0.01

GG

G G

Page 36: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

42

G

T

T

quality score: 10 P(error) = 0.1

quality score: 20 P(error) = 0.01

quality score: 20 P(error) = 0.01

GG

G G

Page 37: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

43

G

T

T

quality score: 10 P(error) = 0.1

quality score: 20 P(error) = 0.01

quality score: 20 P(error) = 0.01

GT

G T

Page 38: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

44

G

T

T

quality score: 10 P(error) = 0.1

quality score: 20 P(error) = 0.01

quality score: 20 P(error) = 0.01

TT

T T

Page 39: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

45

G

T

T

quality score: 10 P(error) = 0.1

quality score: 20 P(error) = 0.01

quality score: 20 P(error) = 0.01 TT = 0.0327

GT = 0.11511

GG = 0.00001

Page 40: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

46

TT = 0.0327

GT = 0.11511

GG = 0.00001

GG GG GT GT TT TT

A likelihood in itself is not meaningful,

you need to compare it to other

models

Page 41: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

47

TT = 0.0327

GT = 0.11511

GG = 0.00001

We will neglect the genotype prior this time

Page 42: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

48

GG = 6.7e-05

GT = 0.77888

TT = 0.22104

Page 43: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

49

GG = 6.7e-05

GT = 0.77888

TT = 0.22104

Important point: More coverage →More multiplications →The relative difference between models become larger

Page 44: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

50

GG = 6.7e-05 41.70 40.60

GT = 0.77888 1.09 0.00

TT = 0.22104 6.56 5.47

PHRED PHRED-scaled

What is the

difference between the most likely and second

most likely

Page 45: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Details I did not cover

- Error model- Most genotypers do not simply use raw quality scores

51

Page 46: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Most common genotypers

• GATK

• SAMtools/BCFtools

• graphtyper

• FreeBayes

52

Page 47: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling 53

https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-

GATK’s recommended workflow

Page 48: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The PCR amplification step included in the majority of NGS library construction techniques can introduce duplicates in the data.

Page 49: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

We want: remove or mark them to avoid false callsex: the site below is probably heterozygous (i.e. the is the second allele)

Page 50: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Genotypers will ignore reads marked as duplicatesex: the site below is probably homozygous (i.e. the is a seq. error)

Page 51: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Duplicate/marking removal

Basic concepts of duplicate marking algorithm:

• Identify genomic position and strand for 5’-most bases.

• Mark reads that are duplicates of each other.

• Within a group of duplicate reads, the read with the highest sum of base

quality scores is retained.

http://picard.sourceforge.net/

Page 52: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Duplicate/marking removal

Problems:

• Does not account for sequencing errors.

• Does not account for natural duplicates.

• Does not account for duplicate reads with different mapping locations.

Page 53: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling 59

https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-

GATK’s recommended workflow

Page 54: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

• remember those?

• There are supposedto reflect P(error)

• They are not always accurate: problem for genotyping

Base quality score recalibration?

Page 55: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Page 56: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Page 57: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Base quality score recalibration

To work we need:• East Asian or European (as in mostly West European) samples• WGS• Sufficient coverage

My biased opinion:• Just don’t bother

Page 58: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling 64

https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-

GATK’s recommended workflow

We covered this before

Page 59: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant call format (VCF)

• Details which variants have been called

• Can be bgzip (block gzip) and indexed using tabix

• Using tabix, queries can be made like:

– return all variants in the region chr22:323,340-361,152

65

Page 60: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant call format (VCF)

20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0

20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0

20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66

20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100

20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120

20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63

66

Page 61: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant call format (VCF)

20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0

20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0

20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66

20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100

20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120

20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63

67

name of chromosome (ex: chr1, chr2 ...

Page 62: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant call format (VCF)

20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0

20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0

20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66

20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100

20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120

20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63

68

coordinate on chromosome

Page 63: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant call format (VCF)

20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0

20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0

20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66

20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100

20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120

20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63

69

ID (ex: rs23534)

Page 64: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant call format (VCF)

20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0

20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0

20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66

20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100

20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120

20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63

70

reference base

Page 65: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant call format (VCF)

20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0

20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0

20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66

20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100

20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120

20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63

71

alternative base

Page 66: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant call format (VCF)

20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0

20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0

20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66

20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100

20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120

20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63

72

quality field

Page 67: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant call format (VCF)

20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0

20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0

20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66

20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100

20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120

20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63

73

Filter (ex: ‘LowQual’)

Page 68: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant call format (VCF)

20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0

20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0

20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66

20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100

20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120

20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63

74

Info field ex:AC= allele countDP = depthMQ = root mean square of the mapping quality

Page 69: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant call format (VCF)

20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0

20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0

20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66

20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100

20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120

20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63

75

Format field, what do the next fields mean?

Page 70: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant call format (VCF)

20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0

20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0

20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66

20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100

20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120

20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63

76

Most likely genotype

Page 71: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant call format (VCF)

20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0

20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0

20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66

20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100

20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120

20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63

77

Allele distribution

Page 72: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant call format (VCF)

20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0

20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0

20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66

20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100

20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120

20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63

78

Depth

Page 73: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant call format (VCF)

20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0

20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0

20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66

20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100

20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120

20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63

79

Genotype quality

Page 74: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant call format (VCF)

20 51391523 . A G 173.96 . AC=2;DP=5;MQ=52.03 GT:AD:DP:GQ:PL 1/1:0,5:5:15:188,15,0

20 51392469 . C T 146.14 . AC=2;DP=4;MQ=60.00 GT:AD:DP:GQ:PL 1/1:0,4:4:12:160,12,0

20 51394015 . T C 97.64 . AC=1;DP=6;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,3:6:66:105,0,66

20 51395647 . A C 89.64 . AC=1;DP=7;MQ=57.28 GT:AD:DP:GQ:PL 0/1:4,3:7:97:97,0,100

20 51397399 . C T 93.64 . AC=1;DP=7;MQ=60.00 GT:AD:DP:GQ:PL 0/1:4,3:7:99:101,0,120

20 51402308 . C T 161.64 . AC=1;DP=9;MQ=60.00 GT:AD:DP:GQ:PL 0/1:3,6:9:63:169,0,63

80

PHRED-scaled likelihood

Page 75: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

The likelihood

81

GG = 6.7e-05 41.70 40.60

GT = 0.77888 1.09 0.00

TT = 0.22104 6.56 5.47

PHRED PHRED-scaled

PL field

GQ field

Page 76: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling 82

https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-

GATK’s recommended workflow

Page 77: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant filtration (soft)

• How do we remove false positive calls?

• Use known polymorphic sites to estimate what a real variant and a false variant “looks like”

• Learn how does the known sites (=truth set) look like in our data

• Evaluate on all our data, filter sites that look different!

83

Page 78: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Page 79: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant filtration (hard)• Hard filtering:

– Variant quality score /depth – Mapping quality– Mappability– Strand bias (the variant being seen only on the forward

strand or only on the reverse strand)– Depth

• BCFtools can perform this• Depends on the project at hand• Be careful of introducing a bias in favor of certain types of

variants

Page 80: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling 86

https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-

GATK’s recommended workflow

Page 81: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant annotation

87

What does the SNP do?

Page 82: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant annotation

• Some example of tools:

– Annovar

– Ensembl Variant Effect Predictor (VEP)

– SnpEff

• As good as annotations

• Beware of gene expression

88

Page 83: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

we did not cover (in detail)...

Page 84: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Germline vs somatic

Page 85: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

de novo

Page 86: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Polyploid

Page 87: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Phasing

TACAAATAT

TAGAAACAT

TACAAACAT

TAGAAATATvs

TA AAA AT C T G C

Page 88: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

INDELs

Insertions Deletion

TACAAATAT

TACAAAGCTAT

TACAAATAT

TACAAAT

Page 89: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

TACAAA--TAT TACAAAGCTAT

INDELs

Caution:

GC was inserted

Page 90: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

TACAAA--TAT TACAAAGCTAT

INDELs

Caution:

GC was deleted

Page 91: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

TACAAA--TAT TACAAAGCTAT TACAAAGCTAT

GC was deletedmore likely, not guaranteed!

Page 92: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Structural variants

Page 93: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Structural variants

Translocation:

Page 94: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Structural variants

Estivill, Xavier, and Lluís Armengol. "Copy Number Variants and Common Disorders: Filling the Gaps and Exploring Complexity in Genome-Wide Association Studies." PLoS Genet 3.10 (2007): e190.

Copy number variations (CNV)

Page 95: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Structural variants

Copy number variations (CNV)effect on coverage

Weetman, David, Luc S. Djogbenou, and Eric Lucas. "Copy number variation (CNV) and insecticide resistance in mosquitoes: evolving knowledge or an evolving problem?." Current Opinion in Insect Science 27 (2018): 82-88.

Page 96: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Ethical concerns

privacy, justice, fairness etc..

Page 97: Course list- DTU Health Tech - Alignment post-processing and ......44 G T T quality score: 10 P(error) = 0.1 quality score: 20 P(error) = 0.01 quality score: 20 P(error) = 0.01 TT

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Exercise time!

http://teaching.healthtech.dtu.dk/22126/index.php/Postprocess_exercise