bioinformatics, translational bioinformatics, personalized medicine

58
Bioinformatics, Translational Bioinformatics, Personalized Medicine Uma Chandran, MSIS, PhD Department of Biomedical Informatics University of Pittsburgh [email protected] 412-648-9326 07/17/2013

Upload: jolie

Post on 15-Feb-2016

161 views

Category:

Documents


5 download

DESCRIPTION

Bioinformatics, Translational Bioinformatics, Personalized Medicine. Uma Chandran, MSIS, PhD Department of Biomedical Informatics University of Pittsburgh [email protected] 412-648-9326 07/17/2013. Outline of lecture. What is Bioinformatics ? Examples of bioinformatics Past to present - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Bioinformatics, Translational Bioinformatics, Personalized Medicine

Uma Chandran, MSIS, PhD Department of Biomedical Informatics

University of [email protected]

412-648-932607/17/2013

Page 2: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Outline of lecture

• What is Bioinformatics?– Examples of bioinformatics– Past to present

• What is translational bioinformatics?• Personalized Medicine

– Bioinformatics and Personalized Medicine

Page 3: Bioinformatics, Translational Bioinformatics, Personalized Medicine

What is Bioinformatics?

• http://en.wikipedia.org/wiki/Bioinformatics

• Application of information technology to molecular biology

• Databases• Algorithms• Statistical techniques

Page 4: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Bioinformatics examples• Sequence analysis• Genome annotation• Evolutionary biology• Literature analysis• Analysis of Gene Expression• Analysis of regulation• Analysis of protein expression• Analysis of mutations in cancer• Comparative genomics• Systems Biology• Image analysis• Protein structure prediction

From Wikipedia

Page 5: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Early Bioinformatics

• Robert Ledley and Margaret Dayhoff– First bioinformaticians– Using IBM 7090 and punch

card analyzed amino acid structure of proteins

– Created amino acid scoring matrix

– Protein evolution– Protein sequence

alignment

http://blog.openhelix.eu/?p=1078

Page 6: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Sequence analysis

• Databases to store sequence info– Phage Φ-X174 sequenced

in 1977– GenBank

• 30, 000 organisms• 143 billion base pairs

– BLAST program for sequence searching

• Algorithms, databases, software tools

Page 7: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Evolutionary biology

• Compare relationships between organism by comparing– DNA sequences– Now whole genomes

• Can even find single base changes, duplication, insertions, deletions

• Uses advanced algorithms, programs and computational resources

Page 8: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Literature mining• Millions of articles in the literature• How to find meaningful

information– Natural language processing

techniques• Example

– Type in p53 or PTEN in Pubmed – will retrieve 1000s of publications

– How to summarize all the information for a particular gene

– Function, disease, mutations, drugs– IHOP database creates network

between genes and proteins for 30000 genes

Page 9: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Genome annotation

• Marking genes and other features in DNA

• Algorithms, software

Page 10: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Bioinformatics

• Interdisciplinary discipline– Gene/proteins/function/ - Biologist– In Cancer – Physician/Scientist/Biologist– Algorithms, for example, BLAST – Math/CS– Separate Signal from Noise, Diff gene expression,

correlation with disease – Statistician– Tools, Software, Databases – Software developers,

programmers• Aim to make sense of biological data

Page 11: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Translational bioinformatics• Translational = benchside to bedside

– Bringing discoveries made at the benchside to clinical use• the development of storage, analytic, and interpretive methods to

optimize the transformation of increasingly voluminous biomedical data into proactive, predictive, preventative, and participatory health. Translational bioinformatics includes research on the development of novel techniques for the integration of biological and clinical data and the evolution of clinical informatics methodology to encompass biological observations. The end product of translational bioinformatics is newly found knowledge from these integrative efforts that can be disseminated to a variety of stakeholders, including biomedical scientists, clinicians, and patients.”

• Translational = benchside to bedside

Atul Butte, JAMIA 2008;15:709-714 doi:10.1197

Page 12: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Central dogma

• DNA is transcribed to RNA

• RNA is translated to protein

• Many regulatory processes control these steps

Page 13: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Molecular Biology Primer

• 20, 000 genes• Many transcripts, many proteins• More than 20, 000 proteins• Southern, Northern, Western Blots

Page 14: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Biological questions

• DNA– Are there any mutations

• sickle cell anemia• Cystic fibrosis• Hemophilia• Other diseases such as

diabetes, cancer ??– Polymorphisms

• Variation in the population

• Mutation

Page 15: Bioinformatics, Translational Bioinformatics, Personalized Medicine

DNA amplification

• Are there regions of amplification or deletions that correlate with disease– If so, what genes are

present in these regions– HER2 amplification in

breast cancer– EGFR mutations in lung

cancer

Page 16: Bioinformatics, Translational Bioinformatics, Personalized Medicine

RNA

• RNA– DNA is transcribed to RNA– Approximately 20K genes

• RNA levels will differ in different conditions

– Liver, kidney, cancer, normal, treatment etc

– Diagnosis or prognostic– microRNAs level– lnncRNAs– Splicing differences

mRNA

Page 17: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Clinical questions• DNA level

– Are there mutations or polymorphism between different cancer patient groups

• Good outcome v bad outcome• Early stage vs late stage• Therapy responders v non-responders• Examples: Renal cell, prostate cancer etc

• RNA – Are there specific transcripts – mRNA, microRNA - that are up or down and

are signature for outcome, disease and response– 1000s of studies– Consortia projects

• TCGA – The Cancer Genome Atlas projects• Profile 500 samples of each cancer for DNA, RNA changes

Page 18: Bioinformatics, Translational Bioinformatics, Personalized Medicine
Page 19: Bioinformatics, Translational Bioinformatics, Personalized Medicine
Page 20: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Molecular Biology Primer

• 20, 000 genes• Many transcripts, many proteins• More than 20, 000 proteins• Southern, Northern, Western Blots

Page 21: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Base pairing

• Microarray and Northern/Southern blots– Exploit the ability of

nucleotides to hybridize to each other

– Base pairing– Complementary bases

• A :T (U)• G: C

Page 22: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Northern

Sensitivity and dynamic range low

Page 23: Bioinformatics, Translational Bioinformatics, Personalized Medicine

How are these changes measured

• Example: Northern blot (measure RNA)– http://www.youtube.com/watch?v=KfHZFyADnNg– Workflow of Northern blot

• Key points– mRNA run on gel – separated by size– transferred to a membrane – immobilized– Have a hypothesis – for example studying RNA level for BRCA in

normal and cancer– Only probe for a mRNA or transcript is labeled or tagged– probe is prepared and labeled with radioactivity– Hybridized to X-ray film– Only that mRNA is detected and quantitated

Page 24: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Microarrays• Solid surface

– Many different technologies• Affy, Illumina, Agilent

– Probes are synthesized on the solid surface

• Synthesized using proprietary technology

– Probe are selected using proprietary algorithms

– RNA (or DNA) is in solutions– RNA is labeled or tagged– Hybridized to the chip– Tagged RNA is quantitated– Compare between conditions

Page 25: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Affymetrix

Page 26: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Need for computational methods

• Data Management– Each file for a chip experiment is

large• 100MG x 10 = 1G• Generates Gigabytes of data

• Data preprocessing– Convert raw image into signal

values• Data analysis

– 1000s of genes (or SNPs) and few samples

– How to find differences between samples

– What statistical methods to use?– Like finding needle in a haystack

Page 27: Bioinformatics, Translational Bioinformatics, Personalized Medicine

How to analyze?

name id 2 2 2 2 2 2 2 2 2 2 2 2Rab geranylgeranyltransferase, alpha subunit100_g_at 231.5 250 369.7 217.5 489 228 336.3 363.2 381.7 373.2 263.8 302.8mitogen-activated protein kinase 31000_at 477.9 662.7 589.9 883.8 395.5 979.5 420.4 457.8 389.1 495.7 346.3 482.5tyrosine kinase with immunoglobulin and epidermal growth factor homology domains1001_at 47.4 150.7 15.2 86 128.1 62.7 131.8 54.4 59.6 116.4 32.7 22.2Burkitt lymphoma receptor 1, GTP binding protein (chemokine (C-X-C motif) receptor 5)1004_at 87 114.4 220 104.5 185.7 175.2 170.8 186.5 223.6 42.7 93.4 115.1dual specificity phosphatase 11005_at 593.5 887.4 299.3 1324.8 132.4 831.8 173 117.5 112.6 241.5 153.9 212.2--- 1008_f_at 3205.4 1582.4 5618.8 3589.1 1401.2 2951.4 1910.3 1217.8 1195.2 2928.7 1305.9 589.6dual-specificity tyrosine-(Y)-phosphorylation regulated kinase 4101_at 93.5 29.3 33.5 32.7 24.1 17.2 47.6 100.4 19.3 20.4 111.7 78tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, epsilon polypeptide1011_s_at 717.6 426.6 61.7 468 285.5 276.8 154.9 242.9 166.7 257.3 283.9 390.8--- 1017_at 33.1 173.1 82.8 213.7 132.6 393.6 57.5 183.4 237.2 81.2 103 104.4wingless-type MMTV integration site family, member 10B1019_g_at 199.2 310.4 215.4 393.7 156.9 307.1 187.1 184.6 204.8 290.2 154 172.2calcium and integrin binding 1 (calmyrin)1020_s_at 852 207.9 272.7 243.5 592.4 227.2 651.7 643.9 517.6 478.8 742 1099.3interferon, gamma1021_at 14.6 58.4 161.5 11.3 18.4 36.1 4.2 40.6 14.3 122.7 6.9 43.6collagen, type XI, alpha 21026_s_at 122 198.8 192.6 194.6 53.7 341.8 37 88.2 224.5 194.4 134.5 107.6topoisomerase (DNA) III alpha1028_at 123.7 153.5 195.2 238.8 126.6 145.3 115 145.5 198.8 166.8 101.5 117.1thrombospondin 4103_at 11.5 33.8 31 96 26.1 41.1 19.3 34.3 76.2 32.5 28 12topoisomerase (DNA) I1030_s_at 837.2 817.4 936.4 662.3 939.3 708.1 890.5 1006.6 698 742.3 838 1093.6interleukin 8 receptor, beta1032_at 275.6 515.3 620 381.3 417.4 408.3 332.4 435.6 394.7 366.7 308.6 499.9interleukin 8 receptor, beta1033_g_at 156.4 125.1 264.9 168.7 33.7 112.6 127.7 28.9 23.5 56.1 38.9 94tissue inhibitor of metalloproteinase 3 (Sorsby fundus dystrophy, pseudoinflammatory)1034_at 267.9 390.1 507.2 390.7 273.3 512.9 301.3 187.7 216.6 255.8 180.6 160.7tissue inhibitor of metalloproteinase 3 (Sorsby fundus dystrophy, pseudoinflammatory)1035_g_at 391 331.8 556.1 186.1 196.6 350.1 167.2 77.7 372.3 427.7 110.8 195interferon gamma receptor 11038_s_at 290.7 235.6 93.9 200.4 267.1 231.5 313.8 243.7 185.4 183.5 317.2 333.4hypoxia-inducible factor 1, alpha subunit (basic helix-loop-helix transcription factor)1039_s_at 309.5 120.3 332.2 94.9 96.7 103.5 278.4 146.8 87.2 523.9 216.6 242.8POU domain, class 6, transcription factor 1104_at 80.6 170.7 139.4 140.8 178.5 182.4 124.1 94.9 148.6 115.2 74.4 54.5ephrin-A5 1041_at 96.1 81.9 332.3 53.3 10.2 57.5 13 36.6 100.8 77.5 57.3 48.6E2F transcription factor 5, p130-binding1044_s_at 130.6 94.8 175.1 210.3 125.3 143.5 118.7 129.5 91 15.7 109.8 169.4--- 1047_s_at 95.4 1055.9 368.2 170.5 146.4 99.2 103.5 166.3 221.5 190.3 70.9 87.4melan-A 1051_g_at 14.1 18.8 48.9 23 62.4 120.9 19.3 28.8 20.6 149.8 12.8 9.7CCAAT/enhancer binding protein (C/EBP), delta1052_s_at 2091.5 2732.8 2984.6 1157.3 3959.9 1280.4 4129.2 2391.4 4279.4 1673.5 4456 4965.1replication factor C (activator 1) 2, 40kDa1053_at 168.5 17.1 30.1 99 55.6 34.9 86.2 82.4 10.2 245.3 54.8 100.4replication factor C (activator 1) 4, 37kDa1055_g_at 285.4 112.4 113.5 97.1 403 166.8 294.4 586 329.7 157.6 480 163.9cellular retinoic acid binding protein 21057_at 52.1 89.3 162.4 117.9 67.8 136.7 103.4 104.9 126 105 20 85.5runt-related transcription factor 3106_at 10.4 28.3 196.9 10.2 31.8 13.1 170.9 5.5 60.2 26.6 38.8 14.6interleukin 10 receptor, alpha1061_at 76.2 116.9 74.9 91.9 157.4 48.7 246 26.3 147.4 47.9 117.6 94.8interleukin 10 receptor, alpha1062_g_at 55.5 66.9 319.2 85.8 27.4 24.6 238.7 96.2 17.9 25.9 25.1 65.1TYRO3 protein tyrosine kinase1063_s_at 21.9 112.5 155.7 118.6 38.1 52 33.8 19.5 79.5 59 54.1 87.2--- 1069_at 45.8 40.1 53.6 6.9 8.7 9 78.9 25.7 11 81.1 4 10.6transcription elongation factor A (SII), 11073_at 884.7 512.5 290.6 714.7 369.1 889.6 733.4 418.1 371.3 595.7 563.1 779.1RAB1A, member RAS oncogene family1074_at 473 101.2 12.8 96.7 153.3 114.6 158.3 173.8 78.9 91.8 219.6 186.6ornithine decarboxylase 11081_at 4336 480 82.1 604.1 561.6 212 773 993.5 958.4 3909.5 1426.7 2014.7phospholipase C, gamma 2 (phosphatidylinositol-specific)1085_s_at 59.7 83.3 551.2 45.2 131.2 151.1 245.4 57.3 110.7 153.9 136.6 125.8Rab9 effector p40109_at 193 114.7 150.2 191.7 108.4 49.6 166.3 117.7 92.4 274.7 109.5 109.4endothelin 21092_at 394.6 358.1 151.9 264.5 174 386 552.6 373.3 371.5 311.2 266.4 397protein phosphatase 2 (formerly 2A), regulatory subunit A (PR 65), beta isoform1094_g_at 149.2 209.3 74.8 67.7 64.4 53.1 24.2 73.6 73.9 54.7 75 156.3CD19 antigen1096_g_at 87.6 242.9 343.1 183.6 41.8 280.2 276 154.1 338.4 224.9 95.6 161.2chemokine (C-C motif) receptor 71097_s_at 36.6 100.3 60.7 144.4 187.6 119 248.5 107.4 153.8 101.7 84.3 118.5glutathione S-transferase theta 21099_s_at 21.7 23.5 43.1 30.2 15.5 24.5 33.5 12.6 18.8 32 15.2 10.8chondroitin sulfate proteoglycan 4 (melanoma-associated)110_at 103.6 300.6 384.3 375 324.2 347.1 331.5 253.3 269.7 324.7 294.5 150.4interleukin-1 receptor-associated kinase 11100_at 256 234.1 306.4 432.6 314.3 249.5 274.4 354.5 339 288.1 326.1 363.2amyloid beta (A4) precursor protein-binding, family B, member 1 (Fe65)1101_at 86 207 312 227 155.1 212.6 163.9 111.3 86.4 59.1 92.6 71.5angiogenin, ribonuclease, RNase A family, 51103_at 257.9 284.7 25.6 163 14.2 53.3 16.1 20.2 57.6 15.9 78.5 82--- 1104_s_at 16378.1 4845.4 1160.6 2711.3 4218.5 1763.4 13180.9 3238.5 3130.4 3576.2 2802.9 1992interferon, alpha-inducible protein (clone IFI-15K)1107_s_at 206.9 375.2 338.7 399.2 325.6 498.1 283.5 305.3 411.4 468.1 269.3 386.2Rab geranylgeranyltransferase, alpha subunit111_at 113.9 32 32 143.3 48.7 110.2 107.4 151.7 183.8 115.8 140.4 203.7

Samples

GENES

Normal TumorNoise reductionBackground subtractionNormalization

Data Analysis

Page 28: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Data analysis

• Class discovery– Are there novel subclasses

within data?

• Class comparison– How are tumor and normal

different in expression?– Which SNPs are different?

• Class prediction– Predict class of new sample

• Advanced pathway Analysis

Page 29: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Pathway Analysis

Page 30: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Analytic methods – many studies, many methods

Dupuy and Simon, JNCI; 2007

Page 31: Bioinformatics, Translational Bioinformatics, Personalized Medicine

SNPs to detect Copy Number changes

diploid deletion

amplification amplification

Page 32: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Hagenkord et al; Modern Pathology, 21:599

Page 33: Bioinformatics, Translational Bioinformatics, Personalized Medicine

What is personalized medicine

• Personalized medicine is the tailoring of medical treatment to the individual characteristics of each patient.

• Based on scientific breakthroughs in understanding of how a person’s unique molecular and genetic profile makes them susceptible to certain diseases.

• ability to predict which medical treatments will be safe and effective for each patient, and which ones will not be.

From ageofpersonalizedmedicine.org

Page 34: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Personalized Medicine

From ageofpersonalizedmedicine.org

Page 35: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Personalized Medicine

From Fernald et al; Bioinformatics, 13: 1741

Page 36: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Examples of personalized medicine

• Breast cancer– 30% of patients over express HER2– Treated with Herceptin– Oncotype Dx: gene expression predicting

recurrence• Cardiovascular

– Patients response to Warfarin, the blood thinner– Response determined by polymorphism in a CYP

genes

Page 37: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Personalized Medicine

• Examples of personalized medicine resulted from studies that generate– Lots of data– Rely on bioinformatics methods to discover these

associations• Oncotype Dx:

– Gene expression studies of large number of patients• CYP polymorphisms

– Discover single nucleotide polymorphisms in patient polulations and association with response

» Initial studies done with PCR methods

Page 38: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Personalized Medicine• Current examples are few in numbers• Making personalized medicine a reality

– Generate the data– Discover the associations– Find targeted therapies– Genome sequences prices are dropping– Large scale genome information is coming:

• 1000 genome • TCGA• ICGC• Also possible to commercially sequence a person’s genome

• Processing all this data into translating these discoveries into medical practice has many challenges

Page 39: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Bioinformatics challenges in personalized medicine

• Processing large scale robust genomic data• Interpreting the functional impact of variants• Integrating data to relate complex interactions

with phenotypes• Translating into medical practice

Fernald et al; Bioinformatics: 13: 1741

Page 40: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Era of Personalized medicine

• Shift from microarrays to Next Gen Sequencing

Page 41: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Central dogma

• DNA is transcribed to RNA

• RNA is translated to protein

• Many regulatory processes control these steps

Page 42: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Next Gen Sequencing

• Directly sequence DNA to determine– SNP– CN– Expression, mRNA, microRNA– Protein binding sites– Methylation

• Initial steps depend not on hybridization but also on base pairing or complementarity and DNA synthesis

• Bioinformatics is extremely challenging

Page 43: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Next Gen Sequencing

Page 44: Bioinformatics, Translational Bioinformatics, Personalized Medicine

NGS in personalized medicine

• Whole genome sequencing– Sequence genomes and find variants (1000 genome

project)• Find variants associated with disease phenotype

• Sequence exomes only– Find coding region variants associated with

phenotypes• RNA seq

– RNA sequence signatures associated with phenotype

Page 45: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Microarrays v NGS RNA Seq

• Restricted to probes on chips

• Only transcripts with probes

• File sizes in MBs to GB• Algorithms, methods• Typically done on PCs• Storage on hard drives

• No – predetermined probes• Can detect everything that

is sequenced• More applications than

microarray• Very large file sizes• Computationally very

intensive• Clusters, supercomputers• Large scale storage solutions

Page 46: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Microarrays v RNA seq Expression Analysis

• Dynamic range is low• Statistic to determine

expression based on signal

• Many methods in the last 10 years

• Dynamic range is high– Based on reads

• Statistics based on counts– Affected by read length,

total number of transcripts, lack of replicates

Page 47: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Read mapping Alignment• Denovo assembly• Mapping to reference genome

– Based on complementarity of a given 35 nucleotide to the entire genome

– Computationally intensive• Million of 35 bp reads has to

search for alignment against the reference and align spefically to a given regions

– Large file sizes• Sequence files in the TB• Aligned file BAM files

– Several hundred GB

Reference genome

Page 48: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Sequence variation

Page 49: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Bioinformatics challenges in personalized medicine

• Processing large scale robust genomic data– Suppose we want to identify DNA variants associated with disease

• Which technology• How much data• How to analyze the data• How to identify variants• Each genome can have millions of variants• 300, 000 new variants – i.e, not in existing databases

– Will have to separate error from true variants– 1 error per 100 kb can lead to 30,000 errors in a single experiment

• Why do these errors happen?

Fernald et al; Bioinformatics: 13: 1741

Page 50: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Bioinformatics Challenges• Data• Which technology to use

– Each technology has different error rates , Ion Torrent (higher error rate), SOLID, Illumina– Speed of generation of data – Ion Torrent is faster

• Application – Whole genome or exome or targeted exome• Analysis

• Analysis– Algorithms, speed, accuracy– BLAST is not good for WGS– Other new algorithms

• Speed of analysis– Alignment can take days

• Alignment relies on matches between sequence and reference genome– How much mismatches to tolerate– True mismatch or error – sequencing error, true mismatch – is it a SNP

• Quality of reference genome

• Large amounts of data– Each whole genome sequencing experiment can generate TB of data

• Where to store – patient privacy– Servers, locations, networking

• Sample sizes – how many samples to sequence to discover the association with disease

Page 51: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Bioinformatics Challenges• Technology

– Ion Torrent, SoLiD, Illumina– Each has its own error rates– Speed of data generation– Dependent on application – WGS or exome

• Analysis• Analysis

– Algorithms, speed, accuracy• Speed of analysis

– Alignment can take days• Alignment relies on matches between sequence and reference genome

– How much mismatches to tolerate– True mismatch or error – sequencing error, true mismatch – is it a SNP

• Quality of reference genome

Page 52: Bioinformatics, Translational Bioinformatics, Personalized Medicine

From Mark Boguski’s presentation at the IOM, July 19, 2011

Page 53: Bioinformatics, Translational Bioinformatics, Personalized Medicine

From Mark Boguski’s presentation at the IOM, July 19, 2011

Page 54: Bioinformatics, Translational Bioinformatics, Personalized Medicine

From Mark Boguski’s presentation at the IOM, July 19, 2011

Page 55: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Molecular Diagnostics using NGS

From Mark Boguski’s presentation at the IOM, July 19, 2011

Page 56: Bioinformatics, Translational Bioinformatics, Personalized Medicine

NGS Bioinformatics - medicine

• Infrastructure– Storage, backup, archive– Where – HIPAA compliant?– Network

• How to move data

• Analysis– Methods – statistics, annotation– Computing resources– How many samples can be handled at a time?– Time to report

Page 57: Bioinformatics, Translational Bioinformatics, Personalized Medicine

NGS and bioinformatics

Page 58: Bioinformatics, Translational Bioinformatics, Personalized Medicine

Next Gen Sequencing

From Mark Boguski’s presentation at the IOM, July 19, 2011