building an information infrastructure for personalized ... · – (or whole genome sequencing or...
TRANSCRIPT
Towards a processual perspective on architecture Building an Information Infrastructure for Personalized Medicine (Paper presented at NOKOBIT 2014) Margunn Aanestad1, Johan Sæbø1, Thomas B. Grünfeld2 1) Institutt for Informatikk, UiO 2) Oslo Universitetssykehus
• «National DNA sequencing data platform for healthcare»
• Cross-disciplinary research project supported by VERDIKT in the Norw. Research Council
• Collaboration between OUS and UiO – OUS: Dept. of Medical Genetics, Stab IKT – UiO: Norwegian Sequencing Center, IFI, USIT,
Faculty of Law
genAP
• Project title: National DNA sequencing data platform for healthcare
• High Throughput Sequencing (HTS) – (or Whole Genome Sequencing or Next Generation
Sequencing) – Replaces older sequencing technologies that give a partial
view (e.g. targetted to one or more genes) – Expectations/incentives:
• Store a person’s sequence and make it available for later query
• Share competency and infrastructure nationally • Increased clinical usage (Personalized medicine)
Rapid evolution of sequencing technologies The Human Genome Project: - Full sequencing of one human genome - Sanger sequencing - 13 years, $3 billion
HTS: - A few days - Jan 2014: $1000
Research
Patient-oriented
Germline mutations
Somatic mutations
New mutations Cancer research
Known mutations
Cancer therapy (targetted)
DNA-sekvens
Varianter
Funksjonelle varianter
Kliniske relevante varianter
Pilot case #1
Pilot case #2
Pilot case #3
Deployment of NGS in HC
”Rare diseases” Diagnostics
Pharmacogenetics / treatment guidance
Cancer (Diagnostics and treatment
guidance)
Microbiology
Predictive/ risk assessment
•Improved / more cost effective diagnostics •Post natal screening? •Sub-grouping assessing genetic factors
•Mapping of oligo/ polygenic conditions/ risks (”Common disorders( •Improved prognostics/ preventative care
•Specific viral and bacterial diagnostics •Possibly later microbiome mapping?
•Sub- grouping / ”improved histology” •CUP’s •Identify optimal drugs (Or excluding non-effective)
•Pharmacokinetics and pharmacodynamics •Pre-surgery: connective tissue disorders and coagulopathies
•Limited experience with normal variation, variable populations •Variable quality of phenotype- genotype relation •Variable expressivity/ penetrance
•Very complex relations, long way to go before we ”beat traditional methods” •Unknown value/ trade off of interventions
•Still expensive compared to current technologies •Microbiomology still ”unknown territories”
•Tumor cell line is unstable: What are we looking at? •Very complex molecular biology
•Current NGS tech. not optimal for CYP typing •Limited experience with multifactorial pharmacokinetics / dymanics •For CT/Coag. disorders see ”rare diseaes”
Examples Challenges
WGS integrated part of EPR
Deployment in screening
(Eg. newborns)
Easier to analyze all,
than 3-5 genes
Offer for those who ”really
need it”
Possible development of full ”whole genome sequencing”
”Sport” for ”the rich and famous”
Fra Henry T. Greely, director Centre for Law and Biosciences, Stanford University
•The first scientists etc. •Mini-solution ”23andme”
•Patients with challenging diagnostics •New causative genes
•Allows for re-use of sequencing (DNA static over life) •Later in-silico analysis
•Simplifies existing methods •Allows for re-use
•”Everybody” routinely analyzed •Integrated expert systems in EPR
Vision for «automated system»:
Genom-data
Hva er CYP3A5, CYP3A4 og POR?
CYP3A5 *1/*3 CYP3A4 *18/*1b POR *28/*28
Genetiker eller legespesialist kan finne ethvert gen bare ved å slå opp i en database
Help to design treatment
Genom-data
Hvordan doserer jeg Marevan for akkurat denne
pasienten?
Et øyeblikk Bzzbzz.. CYP2C9..bzzbzz..
VCORKC1
Bzzbzz.. CYP2C9*1..bzzbz
z..VCORKC1*3
A simple example
Almenpraktiker
Assisted diagnostics
Genom-data
Er dette den arvelige formen
for kardio-myopati?
Et øyeblikk Bzzbzz.. MYH7..bzzbzz… TNNT2..TPM1…
Bzzbzz.. MYH7..bzzbzz… TNNT2..TPM1…
Hei! Den varianten har jeg aldri sett før!
Hjelp meg, Kjære genetiker!
Hm… stoppkodon midt i myomet, det kan umulig gå bra
Ok.
A more complicated example
Almenpraktiker
Genetiker
Status for genAP
• Secure High-Performace Computing infrastructure at USIT (TSD) – Secure computational storage
USIT (UiO) Tungregne-gruppa TSD2: Tjeneste Sensitive Data Forsknings- infrastruktur for prosjekter underlagt: Helseforsknings-, Bioteknologi-, Helseregister- eller Person- opplysningsloven
Status for genAP
• Secure High-Performace Computing infrastructure at USIT – TSD - Secure computational storage
• Presentation of genetic information to clinicians: – Prototype evaluated (Lærum m.fl., JAMIA 2014),
(domain: farmacogenetics)
• Improve the interpretation capacity in the lab – Partial automatisation/better IT support in hte
interpretation process
SeqScape
MS Access
Worksheet (print + manual write)
Mutation tableAlamut
Alamut/browser
Alamut/browser
Browser
AlamutBrowser
Answer/addas class 1
NO
Mutation table(now: print - soon: text
àExcel)
Resequence
Evaluate Sanger
Bad amplicons Common variants(het/hom)
New variants(het/hom) Preclassified
<1 year ago?
YES
On shortlist:common? NO
YES
Lookup inAccess DB
Sanger Sequences Answer withsame classification
Specific to Sanger
Variant DB:classification
+ ref
Artefacts:QV<30 and/or
electropherogram
Project: 4samples
Template: 30bp intron
Paper archive
Technical
Nomen-claturecheck
Report
Evaluatefrequency:
>~10%(lower for genot)
Common?
YES
Answer/addas class 1
Inheritance mode?
Homozygous?(hemizygous?)
Heterozygous?
Add note(+class 1)
YES/NO
Dominant
Recessive
YES/NO
Evaluatefrequency:
>1%(lower for genot)
dbSNP
YES
YES
Add note(+class 2)
Possible carrier/comp heteroz
NO
Report
LOVDGoogleOther
BIC
Reported? YES
Evaluate trustedmutation DB Classified?
Evaluate non-trustedmutation DB/search
VUS/NO
Find references
PubMed
BRCA1/BRCA2?
YES
NO
Answer/addas class 1
Answer/addas class 5
Neutral
Pathogenic
NO
Report
Segregation
Evidence fromreference
Sequencing >90% of genesequenced?
Consistent w/conclusion?
Abnormal proteinfunction?
Abnormal splicing/expression?
Protein
RNA
+5
Reference<10 years?
-5
+2
-1
+2
-1
0
-1
0
-1
Ref age
SUM >0(good reference)?
Add note(+ class 3)
YES Add note(+ class 4)
Add note(+ class 2)
Classification?
>1 independentgood refs?
Classification?YES
Answer/addas class 5
NO
NO
VUS
Neutral
Pathogenic
Report
Neutral
Pathogenic
VUS/conflicting
YES
NO
YES
YES
YES
YES
NO
NO
NO
NO
Report
PolyPhen-2SIFT
SSFLMaxEntScanNNSPLICE
HSF
Mutation type?
Last exonimportant?
+/- 2 bp from exon?
Frameshift
Missense
Any(splice site)
Nonsense>50 bp 5' of
intron?
Answer/addas class 4
Check splice predictionNO
YES
3/4 predict change? Add note(+class 2)NO
Check pathogenicityprediction
YES
SNPs3D
YES
Same for all prog?
Predictedpathogenic?
NOYES
Add note(+class 3)
YES
NO
NO
NO
In last protein-encoding exon?
YES
NO
Note: + class 21 good reference
Note: + class 41 good reference
Pat DBExtract var Pat predFreqinher
Answer withsame classification
Answer: class 1common
Answer: class 1documented
Answer: class 4crucial site
Note: Possible carrier
Note: + class 3bad reference
Answer: class 5documented
Note: + class 3non-conclusive
reference(s)
Note: + class 3predicted splice site
change
Note: + class 2no predicted splice site
change
Note: + class 2missense predicted non-
pathogenicby all
Note: + class 3missense predicted
pathogenic atleast onceOther class 4/5
found?
Note: + class 33'-end nonsense/
frameshift
Note: +class 2recessive +
heterozygous
Note: +class 1dominant + homozygous
Evaluate other sequencevariants same gene
Note: Possiblecomp heterozYES
Class 3/4/5?
YES
Evaluate downstreamanalyses
NO
Answer: class 2
Eval ref
Answer: class 1documented
Answer: class 5documented
Note: problem areas oftenvisible as present in all 4
samples, but notregistered
With sufficientpopulation size; not a
patient pop
Latest dbSNP build notin Alamut, checks web if
variant not found
With sufficientpopulation size; not a
patient pop
Always check BIC
Note: in addition toinheritance, frequency cut-off
should be based onprevalence data and/or
frequency of knownpathogenic variants (adjusted
for each gene/diagnosis)
Note: X-linkedinheritance is a thirdpossibility for other
cases (esp. forgeneral genetics)
Also possible:HGMD Pro,IARC, other
LSDBs
Note: choice ofprograms subject to
discussion
BRCA2...
NMD. Maquat 2004,PMID 15040442
Sample loading and QC
Frequency and inheritance External databases
References
Pathogenicity prediction
Final evaluation and report
Matching against in-house DB
Mapping of the interpretation prosess:
Redesign of IT systems in use
Gene Variant (HGVS) Source Reference(s) Previous evaluations? High quality evidence?
Hide completed
BRCA1 c.38T>C HGMD Pro, LOVD
Van Hausen et al 2012 PMID: 23709336
-
BRCA1 c.38T>C HGMD Pro Elsing et al 2011 PMID: 22904364
-
BRCA1 c.376A>G HGMD Pro Vindaloo et al 2013 PMID:23934762
For BRCA1 c.1275C>T: mceike, 12.08.2013. View
BRCA1 c.483T>G
Manual: mceike
Mansanaar et al 2011 PMID:22334712
-
BRCA1 c.1435C>G HGMD Pro Vindaloo et al 2013 PMID:23934762
mceike, 12.08.2013. Yes, pathogenic
Evaluate
Evaluate
Evaluate
Evaluate
Edit
Perform additional query
Add reference
PMID First author OR
Year
Title
Add
Journal
Choose gene…
Choose variant…
References
References - evaluate
Category YES NO Score
Reference evaluation
Evaluation
Variants with reference hits
External DB Frequency VarDB Prediction
Relevance Is reference relevant?
Selected reference: Van Hausen et al 2012 (PMID:23709336)
Variant: BRCA1 c.38T>C
+5
+5
…
…
0
0
Protein
RNA
Gene coverage
Age of evidence (auto)
Segregation Consistent with conclusion?
Abnormal protein function?
Abnormal splicing/ protein expression?
>90% of gene sequenced?
Reference <10 years?
Variant BRCA1 c.38T>C High quality evidence count
SUM
Conclusion: High quality evidence?
YES NO
0
SUGGESTED
Additional comments
IRRELEV/VUS
Finish
Conclusion Does reference sup- port pathogenicity?
0
PATHOGENIC
NEUTRAL
More complex topics get pop-up help with link to full documentation
Choose gene… Scholar
Choose variant…
PubMed (opens in browser)
When unchecked: evaluation options dimmed
If selecting other than suggested - prompt for confirmation and to fill in comment
Automatically fetched from reference details
If NO/VUS: all remaining evaluation points greyed out, no score given. Suggest “IRRELEVANT”
Sample 000001A loaded | Gene panel: Breast and ovarial cancer v1.0 | Test: Full
Sample
Pop-up instead of right pane
VUS
Gene Variant (HGVS) Norvariome ESP6500 1000G Contradictory information Filter out : Select all
BRCA1 c.434A>C 0.0234 - - -
BRCA1 c.2311T>C - 0.0446 - -
Gene Variant (HGVS) Norvariome ESP6500 1000G Contradictory information Filter out : Select all
BRCA1 c.873T>C - 0.0079 - -
BRCA1 c.1067A>G - 0.0053 - Warning: entry found in HGMD Pro
Details for variant BRCA1 c.1067A>G
HGVS cDNA c.1067A>G Observed genotype CA HGMD Pro Disease-associated
polymorphism Effect Missense SIFT TOLERATED MutationTaster Polymorphism
View raw
View all
View external record
genAP analysis workbench
VarDB Sample Options | Sign out | Help External DB Prediction References Report
Next Previous
Frequency
Neutral variants (>0.008)
Sample 000001A loaded | Gene panel: Breast and ovarial cancer v1.0
Probably neutral variants (0.001-0.008)
Only variants with frequency >0.008 are shown.
Filter
Mockup v0.1
Removes checked variants from further analysis, adds to report.
Pathogenicity prediction tools
Extracting, adding and scoring of references
Report with all details, suggested classification and export tools
External databases
In-house database Sample loading
BIC ClinVar dbSNP HGMD Pro
SeqScape
Analysis workbench
Details for selected variant.
NOKOBIT paper theme
• The project: a case of «emerging architecture» • Processual perspective on architectural
development – Process shapes the architecture – Time perspective (short/long horizon) – Implications for organizing
Scaling down of ambitions • From an exclusive focus on HTS
– To building solutions for «old» sequencing technologies
• From a national scope – to a regional scope – to a hospital scope – to a departmental scope
• From generic exspert system – To a system for spesific ‘gene panels’ (for simple and well
mapped areas, i.e. monogenic & dominant)
• From a ‘common infrastructure’ (or platform) – To a support system for interpretation (improved
production rate and quality)
«Architecting»
• Architecture as structure -> the process of defining and realising the architecture
• Two aspects: – Balancing immediate needs with long-term wishes – Architecture and learning -> the drawbacks of
«premature» architecture • “Architecture is a hypothesis about the future” (Foote and
Yoder 2000)
Method
• Participant observation (working meetings, clinical interpretation work), interviews, document analysis…
• Architecture sketches throughout the project – Represents an evolving understanding of
challenges, components, scope and scale – «Scaling off» (deferring some parts) – «Deepening» (concretization, realization)
”Strawman” arkitektur (Sept 2011)
EXOM base med varians
filer
EXOM base funksjonell annotering
Klinisk algoritme/
Ekspert system
Gradert tilgang avhengig av klinisk spørsmål/ bruk og tillatelser
Brukere PASIENTER
Pleietjeneste?
Allmennlege
Spesialister
•Pediater
•Nevrolog
•Etc
Spesial IT systemer
Ekspert/ Med. genetiker
Tillatelse til bruk fra pasient
LOGG for queries, bruker og tillatelser
SAMTYKKE
LOGG
Imagined system
Kuraterte databaser:
klinisk relevans
OPPDATERING
Offentlige genomdata
ANNOTERING
GENOMDATA PASIENTER
Funksjonell relevans Rådata
PASIENT
Genetisk veiledning
Tilpasset medisinering/
behandling
Diagnostisk støtte
...
Diagnose: • ukjent • usikker Medisinering:
• effekt • bivirkninger Risiko:
• familiær • relatert ...
Komplisert/ sensitivt
IKKE-GENETIKERE
Pleietjeneste
Almenlege
Spesialist
...
EKSPERT
Medisinsk genetiker
LOGG
Diagnostikk
Farmako-genetikk
Prognostikk
EKSPERT-SYSTEM
Fleksibelt
ANALYSE-SYSTEM
GR
AD
ERT TILG
AN
G
Architecture for platform
Updated overview of system (from Feb 2014)
The emerging architecture: • Bottom-up (from the concrete setting) and
shaped by what is possible to do: – Who (clinicians) are interested to join? – In what clinical domains are there low-hanging
fruits, with low risk and high gain? • The HTS performance for that domain • The knowledge about gene-disease relations • The volume and ‘status’ of patients affected
– What do others (externals) do? And not do? – What do project members want to do…etc.
Architecture workshop 28.3.2014
Examples in paper:
• Curated database of variants – From a patient-centered to a variant-centered design
• Choice of implementation domains – Pharmacogenetics -> other domains
• «External» dependencies: – Development of HTS – Change of clinical systems at OUS
• From specific to generic – From code t config-files – Pilot -> routine (NHN/USIT?)
Emerging architecture
Backward compatibility (installed base)
Future compatibility (strategic vision)
Context compatibility (opportunistic actions)
Shearing Layers (diff. rate of change)
• Stewart Brand (1994): «How Buildings Learn» – Site (the geographical location) – Structure (the load bearing elements) – Skin (the exterior surface) – Services (the circulatory and nervous systems of a
building, such as its heating plant, wiring, and plumbing) – Space Plan (walls, flooring, and ceilings) – Stuff (includes lamps, chairs, appliances, and paintings).
Motivation • The reason for our interest in these
processes is that we believe there will be implications for scalability when design is driven by local and immediate concerns, as well as implications for generalisability when design is driven by particularity of (local) needs. We think it is significant to examine whether (and how) choices of diversifying and/or generifying the solution remain open over time, or not.
Summing up the paper’s argument
• Processual perspective on evolution of architecture – The process shapes the architecture – Temporal perspective (short/long horizon) – Architecture -> Organizing
Commercial solution: Cartagenia
Mulig mellomløsning: Storvolumprøver (F. eks BRCA etc)
Panel med relativt få gener
Viktig med ”100% sensitivitet”, - alle varianter må vurderes
Hastighet og ettersporbarhet viktig: Låste prosedyrer
Småvolumprøver (Skreddersydde paneler for sjeldne / uklare tilstander) Ofte store / spesialdesignede paneler
Evt triosekvensering på fulleksom
100% sensitivitet uansett umulig, HTS uansett bedre enn Sanger/ målrettede sekvenser
Fleksibilitet viktigere enn ”streng reproduserbarhet
Egen-utviklet FFI løsning (Evt ORC)
Kommersiell fleksibel løsning
Hvor setter vi grensen
for bruk av hvilken
applikasjon?
Bredde proprietær løsning
Koble mot ORC modul
Mulig migrasjonsløp?
Utvikle FFI for BRCA++
Med økende volum/ erfaring per indikasjon,
”låse paneler”
Anvende komm. Løsing på trio og lavvolum, få
erfaringer med nye paneler
Fase nye indikasjoner
over på egen FFI løsning
Fortsette med nye indikasjoner på
”utviklingsbasis”
Fase nye indikasjoner
over på egen FFI løsning
Viktige risikomomenter og trade off’s Dersom vi lener oss veldig på en ekstern/ kommersiell løsing blir det meget krevende å bytte leverandør om de ”skrur til” sin prisingsmodell
”Microsoft effekt”: Om veldig mange aktører går for en ekstern løsning ender vi de facto opp i en ”lock in” på system nivå pga. utvekslingsgrensesnitt, kompatabilitet osv.
Andre open source aktører (F. eks Clin var) kan utvikle gode tolkningsløsninger som mange velger, og således overflødiggjøre vår innsats (men ikke læringen)
Innføring og bruk an ekstern/ kommersiell løsing begrenser egenlæring, øker avhengighet.
Er vi komfortable med det klniske ansvaret for en ikke CE merket løsing for diagnostisk bruk?
Vi har neppe kapasitet på kort sikt til også å håndtere aCGH etc
MEN: Vi bør raskest mulig ha en løsning for tolkere som avhjelper arbeidssituasjonen gitt økte volumer. Må også være Sanger kompatibel.
Viktig jurisk innspill: Det kan bli avgjørende å utvikle en database som er ”ubrukelig” for andre formål enn helse. Dette neppe mulig med ekstern/ kommersiell løsning
Relevant trends in the practice domain:
• Gartner: – Rapid Architected Application Development – Emergent Architecture (middle-out EA, light-
weight EA) • «architect the lines, not the boxes»
• Software Engineering: – Evolutionary software architecture – Emergent software architecture
Links
– Rapid Architected Application Development • http://www.gartner.com/it-glossary/raad-rapid-architected-
application-development
– Emergent Architecture • http://www.gartner.com/newsroom/id/1124112
– Evolutionary software architecture • http://link.springer.com/article/10.1007/s10270-012-0301-
9#page-1
– Emergent software architecture • http://link.springer.com/article/10.1007/s10270-012-0301-
9#page-1