the search for cancer's causes and cures

26
The Search for Cancer’s Causes and Cures Wade L. Schulz, MD, PhD Yale University, Department of Laboratory Medicine

Upload: wadeschulz

Post on 24-Sep-2015

28 views

Category:

Documents


1 download

DESCRIPTION

Elastic{on} 2015 talk on the use of Elasticsearch to index genetic variation in cancer.

TRANSCRIPT

  • The Search for Cancers Causes and Cures

    Wade L. Schulz, MD, PhD

    Yale University, Department of Laboratory Medicine

  • { } CC-BY-ND 4.0

    Cancer Statistics An Improving Outlook?

    { 1 }

    0

    100

    200

    300

    400

    500

    600

    Rat

    e p

    er 1

    00,0

    00

    Incidence Mortality

  • { } CC-BY-ND 4.0 { 2 }

    Precision Medicine

    Tailoring medical therapy to a particular patients characteristics

  • { } CC-BY-ND 4.0

    Presentation to Precision Care

    { 3 }

    Images adapted from Servier Medical Art, CC-BY

  • { } CC-BY-ND 4.0

    When Cells Go Bad

    { 4 }

  • { } CC-BY-ND 4.0

    Genetics in 60 Seconds

    { 5 }

  • { } CC-BY-ND 4.0

    Genetics in 60 Seconds

    { 6 }

  • { } CC-BY-ND 4.0

    Searching for Mutations

    { 7 }

    Gels and Capillaries

  • { } CC-BY-ND 4.0

    Next Generation Sequencing

    { 8 }

    Massively Parallel

  • { } CC-BY-ND 4.0

    NGS The Technology

    { 9 }

  • { } CC-BY-ND 4.0

    $1

    $10

    $100

    $1,000

    $10,000

    $100,000

    $1,000,000

    $10,000,000

    $100,000,000Se

    p-0

    1

    Jan

    -02

    May

    -02

    Sep

    -02

    Jan

    -03

    May

    -03

    Sep

    -03

    Jan

    -04

    May

    -04

    Sep

    -04

    Jan

    -05

    May

    -05

    Sep

    -05

    Jan

    -06

    May

    -06

    Sep

    -06

    Jan

    -07

    May

    -07

    Sep

    -07

    Jan

    -08

    May

    -08

    Sep

    -08

    Jan

    -09

    May

    -09

    Sep

    -09

    Jan

    -10

    May

    -10

    Sep

    -10

    Jan

    -11

    May

    -11

    Sep

    -11

    Jan

    -12

    May

    -12

    Sep

    -12

    Jan

    -13

    May

    -13

    Sep

    -13

    Jan

    -14

    May

    -14

    Moore's Law Cost per Genome

    Cost of Sequencing

    { 10 }

  • { } CC-BY-ND 4.0

    Bases to Bytes

    23 chromosomes 21,000 genes 3,300,000,000 base pairs

    3.3e9 bases X 2 bits 825 MB/sequence

    With metadata: 150 GB/sequence

    3,000,000 variants/genome

    { 11 }

    How big is the genome?

  • { } CC-BY-ND 4.0

    What are the Problems?

    Constantly evolving data schema

    Ability to integrate diverse data silos

    Rapidly increasing needs for data storage

    Need for easy, flexible analysis

    { 12 }

  • { } CC-BY-ND 4.0

    Why Elasticsearch?

    - Rapid on-premise and cloud installations

    - Dynamic schema that supported clinical results and annotation data

    - Availability of libraries for multiple languages (NEST, elasticsearch-py)

    - Tool availability (Kibana, Shield)

    Its great!

    { 13 }

  • { } CC-BY-ND 4.0

    Sequencing and Interpretation Pipeline

    { 14 }

    Gene Sequencing

    Sequence Alignment

    Quality Assurance

    Variant Annotation

    Clinical Interpretation

    Clinical Trial Eligibility

    ResearchManagement

    {galileo} {kepler}

    {galileo} {galileo/kepler}{galileo/kepler}

  • { } CC-BY-ND 4.0

    Whats in a Variant?

    { 15 }

    60G6V:01053:03044 16 chr1 161383 0 16M * 0 0 TTTGCCAGAAAGCAAG

    )///7;;6*669:1:5 ZP:B:f,0.00279573,0.0054005,2.19516e-07

    ZM:B:s,244,0,242,0,0,242,2,270,494,300,0,248,36,0,0,0,272,0,204,272,398,248,246,268,270,0,0,0,302,0,0,0,550,

    38,44,194,14,32,204,2,666,212,222,494,2,2,238,630,92,220,4,102,438,2,60,384,2,76,2,2,294,394,34 ZF:i:28

    RG:Z:60G6V. PG:Z:tmap MD:Z:16 NM:i:0AS:i:16 XA:Z:map4-1 XS:i:16

    60G6V:00605:00113 0 chr1 415215 2 8M5I31M3S * 0 0

    CCAGCCTGGGTGCGTGACAGAGCAAGACTCCGTCTAAAAAGAAAGGT

    B

  • { } CC-BY-ND 4.0

    Whats in a Variant?

    { 16 }

    { "chromosome": "chr7", "position": 148506396, "type": "snv", "refAllele": "A", "altAllele": "C", "totalReads": 1998, "forwardReads": 1038, "forwardRefReads": 524, "forwardAltReads": 514, "reverseReads": 960, "reverseRefReads": 500, "reverseAltReads": 460, "refReads": 1024, "altReads": 974, "vaf": 48.749, "variantRegion": "intronic", "variantEffect": "", "snvEffect": "A>C", "gene": "EZH2

    }

    - Variant location in genome

    - Nucleotide change

    - Sequencing statistics

    - Variant prevalence in specimen

    - Variant coding/protein effects

  • { } CC-BY-ND 4.0

    {Elastic} Searching for Meaning

    { 17 }

    AzureElasticsearch

    Local SQL and Elasticsearch

    OMIM

    COSMIC

    dbSNP

    ClinVar

    Public Databases

    Sequencers Variant AnalysisEffect Prediction

    Public Variant Data

    Private Variant Data

  • { } CC-BY-ND 4.0

    {Elastic} Searching for Meaning

    { 18 }

    OMIM

    COSMIC

    dbSNP

    ClinVar

    Public Databases

    Sequencers Variant AnalysisEffect Prediction

    Public Variant Data

    Private Variant Data

    MVC Application(NEST)

  • { } CC-BY-ND 4.0

    Kibana Drilldown

    { 19 }

    Rapid population stats

    Physicians/researchers can quickly analyze data

    Integration with health record

    Demographics

    Laboratory testing

    Comorbidities

    Treatment information

  • { } CC-BY-ND 4.0

    Kibana Drilldown

    { 20 }

  • { } CC-BY-ND 4.0

    Service Integration

    { 21 }

    Predictive Algorithms

    Quality Assurance-3

    -2

    -1

    0

    1

    2

    3

    Variant Database

    Clinical Interpretation

    System

    Web Service

    Interfaces

    Custom Validation

    Scripts

    Third-Party

    Data Analysis

    Software

  • { } CC-BY-ND 4.0

    Data Sharing

    { 22 }

    Variant Database

    Clinical Interpretation

    System

    Web Service

    Interfaces

  • { } CC-BY-ND 4.0

    Conclusions

    - Genetic sequencing and

    clinical consultation complete

    within one week of biopsy

    - Integrated multiple analysis

    pipelines for clinical

    interpretation and research

    applications

    - Frequently identify patients

    eligible for clinical trials

    Clinical implications

    - Two Elasticsearch clusters

    - Over 60 million variant

    annotations

    - Nearly 10 million documents

    related to cancer-associated

    mutations

    - Kibana and custom web

    applications using NEST for

    data visualization

    System statistics

    { 24 }

  • { }

    Thank you!

    Wade L. Schulz, MD, PhD

    [email protected]

    http://www.wadeschulz.com

    Many images adapted from Servier Medical Art, CC-BY

    Henry Rinder MD, Richard Torres MD, Christopher Tormey MD, Brian Smith MD, John Howe PhD,

    Karl Hager PhD, Rodion Rathbone MD, Nathaniel Price, Alexa Siddon MD

  • { } CC-BY-ND 4.0

    This work is licensed under the Creative Commons

    Attribution-NoDerivatives 4.0 International License.

    To view a copy of this license, visit:

    http://creativecommons.org/licenses/by-nd/4.0/

    or send a letter to:

    Creative Commons

    PO Box 1866

    Mountain View, CA 94042

    USA

    { 25 }