church iowa2013

Post on 24-Jun-2015

6.015 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Talk at Iowa State University 6 Nov 2013

TRANSCRIPT

Deanna M. Church Staff Scientist, NCBI

@deannachurch

Analyzing Individual Genomes 

http://genomereference.org

Valerie Schneider, NCBI

AcknowledgementsGeT-RM

Lisa Kalman (CDC)Birgit Funke (Harvard)Mahduri Hegde (Emory)Maryam HalaviChao ChenJon TrowDouglas SlottaPeter MericDaniel FrishbergVictor Ananiev

ClinVarAlex Astashyn Shanmuga ChitipirallaDouglas Hoffman Wonhee Jang Brandi KattmanMelissa LandrumJennifer LeeAdriana Malheiro Wendy RubinsteinGeorge Riley Amanjeev Sethi Ricardo Villamarin Donna Maglott

ISCAChrista Lese Martin (Geisinger)Erin Riggs (Geisinger)Jose MenaMike FeoloTim HefferonJohn Garner John Lopez

GRCValerie Schneider (NCBI)The Genome Institute at Washington UniversityThe Wellcome Trust Sanger InstituteThe European Bioinformatics Institute

Variation

Phenotypes

Why should you care about the Reference Assembly?

Genes, NCBI Homo sapiens Annotation Release 105

Transcript

CDS

dbSNP Build 138 using annotation release 104

http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes

http://www.bioplanet.com/gcat

What is the Reference Assembly?

An assembly is a MODEL of the genome

BAC insertBAC vector

Shotgun sequence

Assemble

GAPS

“finishers” go in to manually fill the gaps, often by PCR

http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-1012

http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-1321

RP11-34P13 64E8 RP4-669L17 RP5-857K21 RP11-206L10 RP11-54O7

Gaps

http://genomereference.org

NCBI36 (hg18)

GRC

h37

(hg1

9)

NCBI35 (hg17)

GRCh37 (hg19)

AL139246.20

AL139246.21

Build sequence contigs based on contigs defined in TPF (Tiling Path File).

Check for orientation consistenciesSelect switch pointsInstantiate sequence for further analysis

Switch point

Consensus sequence

NCBI36

nsv832911 (nstd68) Submitted on NCBI35 (hg17)

NCBI35 (hg17) Tiling Path

GRCh37 (hg19) Tiling Path

Gap Inserted

Moved approximately 2 Mb distal on chr15

NC_0000015.8 (chr15)

NC_0000015.9 (chr15)

Removed from assembly

Added to assembly

HG-24

Sequences from haplotype 1Sequences from haplotype 2

Old Assembly model: compress into a consensus

New Assembly model: represent both haplotypes

AC074378.4AC079749.5

AC134921.2AC147055.2

AC140484.1AC019173.4

AC093720.2AC021146.7

NCBI36 NC_000004.10 (chr4) Tiling Path

Xue Y et al, 2008

TMPRSS11E TMPRSS11E2

GRCh37 NC_000004.11 (chr4) Tiling Path

AC074378.4AC079749.5

AC134921.1AC147055.2

AC093720.2AC021146.7

TMPRSS11E

GRCh37: NT_167250.1 (UGT2B17 alternate locus)

AC074378.4AC140484.1

AC019173.4AC226496.2

AC021146.7

TMPRSS11E2

nsv532126 (nstd37)

GRCh37 (hg19)

http://genomereference.org

7 alternate haplotypesat the MHC

Alternate loci released as:FASTA

AGPAlignment to chromosome

UGT2B17 MHC MAPT

MHC (chr6)Chr 6 representation (PGF)

Alt_Ref_Locus_2 (COX)

Variant Calling and the Reference Assembly

Kidd et al, 2007 APOBEC cluster

Part of chr22 assembly

Alternate locus for chr22

White: InsertionBlack: Deletion

Rawe et al, 2013

Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320

NM_031192.3: transcript from C57BL/6JNM_031193.2: transcript from FVB/N

129S6/SvEvTac Alt Locus Alignment Ren1 (allelic)

FVB/N Transcript Alignment Ren2 (paralog)

129S6/SvEvTac Ren1

FVB Ren2 Tx

Paralogousdiff

SNP +Paralogous

diff

Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320

NM_031192.3: transcript from C57BL/6JNM_031193.2: transcript from FVB/N

Hydin: chr16 (16q22.2)Hydin2: chr1 (1q21.1)Missing in NCBI35/NCBI36 Unlocalized in GRCh37 Finished in GRCh38

Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID

Alignment to Hydin1 CHM1_1.0, >99.9% ID

(Paralogous)

(Allelic)Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID

Alignment to Hydin1 CHM1_1.0, >99.9% ID

Doggett et al., 2006

http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes

CDC27

1KG Phase 1 Strict accessibility mask

SNP (all)

SNP (not 1KG)

http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes

Sudmant et al., 2010

GRCh38 is coming(September, 2013)

http://genomereference.org

Adding Novel Sequence

Karen Miga and Jim Kent arXiv:1307.0035

Dennis et al., 2012

1q32 1q21 1p21

1p21 patch alignment to chromosome 1

Fixing Rare/Incorrect Bases

Preview of GRCh38 (scheduled Fall 2013)

TEX28 TKTL1

LOC101060233(opsin related)

LOC101060234(TEX28 related)

GRCh37 (current reference assembly)NC_000023.10 (chrX)

NW_003871103.3

FAM23_MRC1 Region, chr10

Segmental Duplications

1KG accessibility Mask

Novel Patch 250 kb of artificial duplication

Adding Novel Sequence

GRCh37p13120 Fix Patches60 Novel

Human Resolved for GRCh38

http://genomereference.org

http://www.ncbi.nlm.nih.gov/genome/tools/remap

From Assembly 1 <-> Assembly 2Assembly <-> RefSeqGene/LRGPrimary Assembly <-> Alternate loci

top related