"o uso da plataforma hpc na descoberta de doenças genéticas" . david santos marco...

15
O uso da plataforma HPC na descoberta de doenças genéticas David Santos Marco Antonio. PhD Profa. Maria Rita Passos Bueno Laboratório de Genética do Desenvolvimento Humano Departamento de Genética e Biologia Evolutiva Instituto de Biociências - USP Finding the variability that could explain genetic disorders.

Upload: lccausp

Post on 12-Aug-2015

32 views

Category:

Health & Medicine


1 download

TRANSCRIPT

O uso da plataforma HPC na descoberta de doenças genéticas

David Santos Marco Antonio. PhD Profa. Maria Rita Passos Bueno Laboratório de Genética do Desenvolvimento Humano Departamento de Genética e Biologia Evolutiva Instituto de Biociências - USP

Finding the variability that could explain genetic disorders.

DNA : Chromosome : Genes

DNA: A T C G

Chromosomes: 1 – 22

X and/or Y Mitochondrial

DNA : Chromosome : Genes

http://en.wikipedia.org/wiki/Human_genome

DNA : Chromosome : Genes

Human genetic variation in populations

• Genes on the same order • Mapped using reference genome: hg19, GRCh38. • Variability among relatives/populations. • Susceptibility to diseases. • Improvements.

Falar sobre populações 1000Genomes

Human genetic variation in populations

Haplogroups Y Mitochondrial DNA

Disorder Mutation Chromosome

22q11.2 deletion syndrome D 22q

Angelman syndrome DCP 15

Canavan disease 17p

Charcot–Marie–Tooth disease

Color blindness P X

Cri du chat D 5

Cystic fibrosis P 7q

Down syndrome C 21

Duchenne muscular dystrophy D Xp

Haemochromatosis P 6

Haemophilia P X

Klinefelter syndrome C X

Neurofibromatosis 17q/22q/?

Phenylketonuria P 12q

Polycystic kidney disease P 16 (PKD1) or 4 (PKD2)

Prader–Willi syndrome DC 15

Sickle-cell disease P 11p

Tay–Sachs disease P 15

Turner syndrome C X

P – Point mutation: InDel. D – Deletion of gene. C – Whole chromosome

extra/missing. T – Nucleotide repeat disorders.

Human Genetic Diseases

Genomic Sequencing

• Terabytes of data/sequencing. • Storage. • Processing. • Lots of RAM.

Ben Moore

Genomic Sequencing

Data Processing

STEPS

Alignment

Quality Control

Quality Control

Sequencing

Reference

Sorting Remove Errors

Realignment

Recalibration

Data Processing

STEPS

Alignment

Quality Control

Quality Control

Sequencing

Reference

Sorting Remove Errors

Realignment

Recalibration Variant Calling

Data Processing

STEPS

Variant Calling

dbSNP

SIFT

PolyPhen OMIM

exac03

6500 Exomes 1000

Genomes

Clinical Relevant Data

High Computational Cost

• Storage. • RAM. • CPU. • Reprocessing.

@SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

Fastq format

Research

• Authism • Cranio-fascial development • Richieri-Costa • Down

Diseases Models

• Human • Zebra fish • Drosophila • Microbiome

Acknowledgements

• LCCA • Guys in LCCA • Guys in LCCA • Profa. Maria Rita Passos Bueno • Laboratório de Genética do Desenvolvimento Humano • Instituto de Biociências. • USP