interruptb data management & -challenges · 2017-03-10 · interruptb data management &...
TRANSCRIPT
INTERRUPTB Data management &
-challenges
Bouke de Jong, MD PhD
Institute of Tropical Medicine, Antwerp
September 18, 2014
1
Outline
• Project overview INTERRUPTB
– Start date 01/01/2013
• Data challenges
• Approach
• Areas requiring more work
2
Effectiveness of “Enhanced-Case-Finding” for TB
• Greater Banjul Area (600,000 people)
• Cluster-randomized trial
Intervention arm: Enhanced-Case-Finding
Control arm: Passive-Case-Finding – Includes measurement of GPS coordinates and cost-effectiveness study
• Outcome measures:
1. Case-detection rate (Global Fund study)
2. Reduction in transmission (ERC)
Does Enhanced-Case-Finding interrupt TB transmission?
Main objectives 1. Compare the proportion of TB-due-to-recent-transmission in intervention
and control arms of a Cluster Randomized Trial on the impact of Enhanced Case Finding Determine genotypic clustering
2. Model the change in the effective reproductive number of M. tuberculosis
resulting from the Enhanced Case Finding intervention Develop a mathematical model to estimate the RE
Secondary objective 3. Study the impact of the immune response on the microevolution of M.
tuberculosis Compare the number of Single Nucleotide Polymorphisms between
sequential isolates from HIV-infected and –uninfected patients
Rationale: How to measure transmission?
DNA
DNA DNA
DNA Identical DNA = recently transmitted
Different DNA = not due to recent transmission
≠
=
Patient X Patient Y
1. Isolate bacteria
2. Sequence DNA
Transmission?
3. Compare genotypes
Determine genotypic clustering as proxy for recent transmission
Objective 1: Determine genotypic clustering
• Genotypically clustered isolates recently transmitted
• In ECF group we expect less transmission less clustering
• Calculate Clustering Rate and Recent Transmission Index RTIn-1
0
10
20
30
40
50
60
70
80
Clu
ste
rin
g p
rop
ort
ion
Time 2012-2015
Control arm Intervention ECF arm
Difference in clustering proportion reduction in transmission
Objective 2. Model the change in RE resulting from the Enhanced Case Finding intervention
Dye, Science 2010 Example of TB compartment model
Stadler, Genetics 2011
Distribution of RE based on allele frequencies
Example • In control area, one patient infected 14 susceptibles • In intervention area, one patient infected 10
susceptibles
Objective 3: Microevolution of TB under immune pressure
HIV negative Immune pressure ↑ SNPs↑
HIV positive Immune pressure ↓ SNPs↓
Data challenges I
• Complex and large dataset
http://lookingupandkneelingdown.blogspot.be/2011_04_01_archive.html
Patients • Demographic- and clinical
information • GPS coordinates • Date of diagnosis • Intervention vs control cluster • Outcome of treatment
Bacteria • Bacteriological
• Genomes • Genetic clustering
Data challenges I
• Building on an existing project
• Non-automated data collection in field had already started
– > 20.000 samples, majority from community based intervention
– Double entry field and lab • Discordances
• Errors in assigning patient identifiers
– Labor intensive, including corrections
Approach to challenge I
• Dedicated databases
– MRC, The Gambia
– ITM, Belgium
• SQL based, Access front-end
• ITM database with audit trail and data verification
Lessons learned- future project
• Automate data management
• Data entry in field in tablet/ smart phone
• Label sputum cup with barcode in field
• Lab scans sputum cup
– Avoid double entry
– Reduce effort and error
http://www.clpmag.com/2009/06/automated-specimen-handling-on-many-labs-wish-lists/
Data challenges II
• Clustering at multiple levels
BacterialGenetic
Spatial
Temporal
Intervention vs control
Data challenges III
• Making sense of genomics data in a mathematical model
Genomes
Phylogenetic tree
SNPs Phylogenetic model
Mathematical model
Genomes
Phylogenetic tree
SNPs Phylogenetic model
Mathematical model Computer storage
Genomes
Phylogenetic tree
SNPs Phylogenetic model
Mathematical model Computer storage
Computing power
& sufficient methods
Approach to challenge III
• Computer data & -storage
– ITM IT department not yet used to hosting large data repositories
– Full access to servers off-site now established
– Working group on SOPs on documentation of analyses, back-ups, and off-site storage
• Computing power
– Most current analyses can be done on stand alone powerful computer
– Established access to CalcUA Flemish super computer for data heavy and/or computing heavy analyses
Deposit of bacteria and their genomes
• Can be done in BCCM in-house public culture collection
• Genomes can be linked
• Yet labor not foreseen for 1000s of isolates
More work to be done I
• Ethics around public collections of pathogens
– Owner? • Patient
• Scientist who isolated
• Country
– Routine diagnostics • Risk to patient if isolate without identifiers included in collection?
More work to be done II
• Guidelines on public access to large genomic datasets
• Cloud capacity
• Access to full associated data
– E.g. country and date of isolation
– Patient treatment history and outcome?
• How to include such ‘service to the scientific community’ in ‘key performance indicators’?
Thanks to
21
Boatema Ofori
Florian Gehre
Conor Meehan
Martin Antonio