large acgh datasets

12
Collecting quantitative CNV information using aCGH in a Cytogenetic laboratory December 2009 Anton Petrov, Ph.D. infoQuant

Upload: josephseki28

Post on 10-May-2015

552 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large aCGH Datasets

Colle

cting

quanti

tative

CNV

infor

matio

n usin

g aC

GHin

a Cyto

geneti

c labo

ratory

Dec

embe

r 200

9

Ant

on P

etro

v, P

h.D

. in

foQ

uant

Page 2: Large aCGH Datasets

Overv

iew

Copy  num

ber  a

nalysis  of  array  CGH  data

Overview  of  an  array-­‐based  Cytogene

tic  te

stLooking  for  clinically  re

levant  abe

rrations

Samples  from

 health

y  individu

als  and  bu

ilding  CN

V  tracks

Abe

rration  freq

uency  data

HapMap

project,  re

lease  3:  Affym

etrixSN

P  6.0  data

Popu

latio

n-­‐specific  CN

V  profiles

HapMap

data:  11  po

pulatio

ns

Prob

e  coverage  issue

CNV  profile  variatio

n  from

 one

 array  platform  to

 ano

ther

Page 3: Large aCGH Datasets

Array

-based

copy

numb

er tes

ts: da

ta ana

lysis

Data  pre-­‐processing

Raw  measuremen

ts  extracted

 from

 a  slide

Data  no

rmalized

 between  tw

o  channe

ls  (experim

ent/reference)

Log-­‐ratio

 measuremen

ts  built  and  arranged

 along  gen

ome

Detectio

n  of  cop

y  nu

mbe

r  changes

Detectio

n  of  re

gion

s  whe

re  log-­‐ratio

s  significantly  deviate  from

 zero  (gains  and

 losses)

A  ro

bust  binary  segm

entatio

n  approach  for  h

igh-­‐res  data  in  infoQuant  softw

are

Repo

rting  of  detected  anom

alies

In  a  Cytogen

etic  te

st  need  to  determine  clinical  re

levance  of  detected  aberratio

ns

Page 4: Large aCGH Datasets

Lookin

g for

clinic

ally r

elevan

t anom

alies

Know

n  gene

sLook  fo

r  gene

s  overlapp

ing  with

 detected  anom

alies

Find

 gen

es,  associated  with

 a  spe

cific  class  of  d

isorde

rs

Know

n  CN

VsPu

blicly  available  CN

V  databases:  Database  of  Gen

omic  Variants  (Toron

to)

Lack  of  con

trol  over  source

Differen

t  array  platforms  used

In-­‐hou

se  CNV  tracks  using  cho

sen  platform

 

Page 5: Large aCGH Datasets

Build

ing C

NV tr

acks i

n copy

numb

er sof

tware

In-­‐hou

se  CNV  track  

Perform  cop

y  nu

mbe

r  analysis  on

 aCG

Hdata  from

 individu

al  sam

ples  (con

trol  group

)Co

llect  CNV  region

s  for  a  coho

rt  of  con

trol  sam

ples  and

 visualize  in  ro

utine  Cyto

tests

Keep

 upd

ating  CN

V  tracks  as  ne

w  sam

ples  get  analyzed  

CNV  freq

uency  profile

Compu

te  freq

uency  of  CNVs  along  gen

ome  based  on

 accum

ulated

 sam

ples

Use  quantita

tive  inform

ation  that  CNV  freq

uencies  provide  to  interpret  relevance  of  

detected

 chrom

osom

al  ano

malies  in  new

ly  acquired  samples  

Page 6: Large aCGH Datasets

Using

CNV

track

s in C

ytogen

etic t

ests

Visualize  accumulated

 CNV  inform

ation  du

ring  ro

utine  tests  

Visualize  region

s  of  freq

uent  CNVs  whe

n  review

ing  data  fo

r  a  ne

w  sam

ple  

to  determine  clinical  re

levance  of  detected  anom

alies

Furthe

r  increase  insight  into  po

ssible  clinical  re

levance  of  a  detected  

anom

aly  using  qu

antitative  inform

ation  provided

 by  CN

V  freq

uencies

Page 7: Large aCGH Datasets

Cohor

ts of

health

y indi

vidual

s: the

HapM

appro

ject

The  HapMap

project

The  International  H

apMap

Project  is  a  partne

rship  of  scien

tists  and

 fund

ing  agen

cies    

from

 various  cou

ntries  www.hapmap.org

The  goal  of  the

 International  H

apMap

Project  is  to  com

pare  th

e  gene

tic  seq

uences  of  

diffe

rent  individu

als  to  iden

tify  chromosom

al  re

gion

s  whe

re  gen

etic  variants  are  

shared

Gen

etic  data  are  be

ing  gathered

 from

 differen

t  hum

an  pop

ulations  >10

00  sam

ples  in  

the  latest  re

lease

aCGHand  SN

P  data  were  ob

tained

 using  differen

t  array  platforms  from

 med

ium  

resolutio

n  to  high  resolutio

n  to  ultra-­‐high  re

solutio

n  over  th

e  years

Page 8: Large aCGH Datasets

Build

ing C

NV fr

equenc

ies ac

ross H

apMap

sample

sA  large,  pow

erful  poo

l  of  d

ata

High-­‐de

nsity

 inform

ation  provided

 by  Affym

etrixSN

P  6.0  arrays

CNV  freq

uencies  across  HapMap

samples  provide

 a  useful  insight  into  how

 freq

uently  a  certain  ano

maly  may  be  ob

served

 in  health

y  individu

als

Separate  gain  freq

uencies  and  loss  freq

uencies  

Useful  add

ition

 to  th

e  Database  of  Gen

omic  Variants  

Page 9: Large aCGH Datasets

Filter

ing H

apMap

CNV

data

Control  group

 may  produ

ce  CNVs  th

at  are  clinically  re

levant  

HapMap

sample  be

low  dem

onstrates  a  large  copy  num

ber  loss  con

firmed

 by  

both  sets  of  m

easuremen

tsThe  region

 includ

es  cancer-­‐related  gene

 NRA

SSuch  ano

malies  ne

ed  to

 be  isolated

 using  cop

y  nu

mbe

r  analysis  softw

are

Filte

r  CNVs  by  size

Whe

n  compu

ting  CN

V  freq

uency  profiles  pre-­‐set  softw

are  to  disregard  CNVs  

larger  th

an  2Mbp

,  for  instance    

Page 10: Large aCGH Datasets

CNV

freque

ncy pr

ofiles

speci

fic to

samp

le att

ribute

Dem

ograph

ic  attribu

tes  may  be  im

portant  

For  e

xample:  11  diffe

rent  pop

ulations  in  HapMap

Individu

al  CNV  freq

uency  plots  may  be  bu

ilt  fo

r  the  po

pulatio

ns  by  aCGH

software  and  used

 in  Cytogen

etic  te

sts  for  more  targeted

 reference  

Page 11: Large aCGH Datasets

Probe

cover

age is

sueDifferent  a

rray  platforms  may  produ

ce  slightly  different  C

NV  profiles  

-­‐scale  and

 com

plex  architecture  of  hum

an  cop

y-­‐nu

mbe

r  

Affym

etrixSN

P  6.0  arrays  (H

apMap

release  3)

Platform

s  may  con

centrate  th

eir  prob

e  coverage  on  differen

t  areas,  hen

ce  differen

t  CNV  

profiles.  This  is  ty

pical  for  perform

ing  copy  num

ber  analysis  across  array  platform

s.

Page 12: Large aCGH Datasets

Othe

r sour

ces of

high

-quali

ty CN

V dat

aOther  studies

CHOP:  High-­‐resolutio

n  mapping  of  cop

y  nu

mbe

r  variatio

ns  in  2,026

 health

y  individu

als  

http://cnv.cho

p.ed

u/Wellcom

eTrust:  Ultra-­‐high  re

solutio

n  CN

V  stud

y.  42  M  probe

 coverage,  custom  2.1  M

 array  de

signs  based  on

 Roche

-­‐Nim

blegen

aCGH.  

http://w

ww.sanger.ac.uk/hum

gen/cnv/42

mio/