methodbox: from open-data to open-insight
Post on 21-Jan-2016
33 Views
Preview:
DESCRIPTION
TRANSCRIPT
Methodbox:From open-data to open-insight
MethodBox TeamJul 2011
Presentation
• ProblemData tsunami + puddles of insight
• SolutionCollective efficient science
• DeploymentSense-making networks on open-data
Quote
“…you call it Epidemiology and we call it quantitative Social Science”
A leading researcher, Jul 2011
Open dataCommon methodsPotentially complementary expertise
Obesity Example
Fragmented understandingof public health problems such as obesity
...data, methods/models and expertisesplit across
disciplines (e.g. social vs. biomedical)
and settings (e.g. academia vs. healthcare)
Puddles of researcharound the organising principle
… but policies need the big picture
Data Example
• Time series data from Health Visitors from Wirral
• Data deposit with UKDA but no uses for 16 years
• Children measured at the time the obesity epidemic took hold…
Fifths of IDAC 2004
Red (light) = most deprived
Red (dark)
Purple
Blue (dark)
Blue (light) = most affluent
Material deprivation affecting children
(households with children: % on benefits in 2001-3)
Wirral (0.3M), UK
BMI of 3 yr olds
1988 - 1989
Fifths of BMISDS BMI fifth
Red (light) = fattest
Red (dark)
Purple
Blue (dark)
Blue (light) = thinnest
BMI of 3 yr olds
1990 - 1991
Fifths of BMISDS BMI fifth
Red (light) = fattest
Red (dark)
Purple
Blue (dark)
Blue (light) = thinnest
BMI of 3 yr olds
1992 - 1993
Fifths of BMISDS BMI fifth
Red (light) = fattest
Red (dark)
Purple
Blue (dark)
Blue (light) = thinnest
BMI of 3 yr olds
1994 - 1995
Fifths of BMISDS BMI fifth
Red (light) = fattest
Red (dark)
Purple
Blue (dark)
Blue (light) = thinnest
BMI of 3 yr olds
1996 - 1997
Fifths of BMISDS BMI fifth
Red (light) = fattest
Red (dark)
Purple
Blue (dark)
Blue (light) = thinnest
BMI of 3 yr olds
1998 - 1999
Fifths of BMISDS BMI fifth
Red (light) = fattest
Red (dark)
Purple
Blue (dark)
Blue (light) = thinnest
BMI of 3 yr olds
2000 – 2001
Fifths of BMISDS BMI fifth
Red (light) = fattest
Red (dark)
Purple
Blue (dark)
Blue (light) = thinnest
BMI of 3 yr olds
2002 - 2003
Fifths of BMISDS BMI fifth
Red (light) = fattest
Red (dark)
Purple
Blue (dark)
Blue (light) = thinnest
Child Obesity:Action 6 years after signal in the data
Body Mass Index (BMI) trend in Wirral 3y-olds from 1988 to 2003
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Mar-88 Jul-89 Nov-90 Apr-92 Aug-93 Jan-95 May-96 Sep-97 Feb-99 Jun-00 Nov-01 Mar-03 Aug-04
Month of measurement by Health Visitor
Th
ree-
mo
nth
ly r
olli
ng
ave
rag
e B
MI S
DS
SDS = standard deviation score from 1990 British Growth Reference charts – adjusts for age and sex of the child
CluesClues ActionsActions
Similar Data in 2011
• National Child Measurement Programme
• Anonymised national database
• Could be opened (like national pupil database) extend to other policy-relevant, timely research
Data Already in UK Data Archive
• Example: Health Surveys for England (annual)
• Analyses feed national policies
• Does evidence need to be localised?...
12
34
5
Men
Women25
25.5
26
26.5
27
27.5
BMI
Income fifth (low to high)
Women and not menfrom low-income households
are fatter in England
Data from Health Survey for England
1 23
45
Men
Women25
25.5
26
26.5
27
27.5
BMI
Income fifth (low to high)
Women from low-income households and men from high-income households
are fatter in Greater Manchester
Data from Health Survey for England
Linked-data ≠Linked: data, methods & investigators
Previous slides showsocial-biomedical signalsabout obesityfrom under-used datasets
Biomedical Research:Data, methods & investigators
Social Research:Data, methods & investigators
MethodBox Aim
..to increase the sharing and reuse of
data sources & extracts
and data processing methods
in one in-silico environment (‘e-Lab’)
shared by social and health researchers
e-Lab
Socially-stimulating science, in-silico
Research Object
FindShareReuse
Data-sources
Data-preparation scripts
Research protocol Statistical analysis scripts
Slides
Working datasets
Figures/Graphics
Manuscripts
References
Analysis-logs & notes
National Dataset Example
• Health Surveys for England– Large-scale (participants * variables)– Annual since early 90s– Under-used by NHS who fund it
– Key barrier:extracting a research-ready subset of data
– Data archive playground = e-Lab
Supporting and Developing Interdisciplinary Understanding
Sharing resources – tools, methods, data
Sharing expertise – discussions and reuse around shared resources
Promoting interdisciplinary working
Developing interdisciplinary understanding – language, tacit assumptions, methods
First step - sharing of resources
Shared resources provide the basis for discussion
Discussions lead to deeper interdisciplinary understanding
Understanding of other domains promotes more effective interdisciplinary working
Facilitating a social networkof data archive users…
…toward a reward environmentfor sharing data, methods,and expertise
Browsing for data extractsmade by a social networkof data archive users…
Shopping for variables from across different years of survey collections…
Instant access to
relevant parts of
survey documentation
…
Making the data extractvisible…
Linking a data extractwith a script forderiving variables…
Sharing and visibility
Enabling user-visibility for data extraction or derivation contributions…
Current MethodBox
Video link
Training Course Apr `10• Trained a mixture of NHS, academic and industry users
of HSE in the use of Methodbox• Course run in conjunction with CCSR• Feedback forms completed by 15 of 16 attendees,
asked to rate Methodbox from 1 (negative) to 7 (positive) on the following statements:– I thought MethodBox was:
• Terrible - Wonderful: Mean = 5.57• Difficult to understand - Easy = 5.57• Frustrating to use - Satisfying = 5.79• Dull - Stimulating = 5.29• Rigid - Flexible = 5.71• Difficult to navigate - easy to navigate = 6
Attitudes to Sharing
Data Scripts
Academic social scientists
Yes No
Academic epidemiologists/medical researchers
No Yes
NHS & Local Govt. analysts
Yes Yes
MethodBox Evolution
• Amazon-like user-prompting forother variables that may be relevantto the set being extracted
• More surveys/datasets incorporated• User-contributed & community-curated
datasets• ….• Feature request list exceeds resources
Building on Successful E-Science
• Most widely used scientific workflow sharing systems: myGrid, Taverna, myExperiment
• Over a decade of programme funding sustained world leading
• E-Infrastructure R&D ready to leverage more outputs from open-linked data
Toward Open Insight
• Researcher A is expert in deprivation• Researcher B is expert in obesity• Both use a common data archive
but don’t usually meet• MethodBox shares the expertise of A and B
to create a more complete model of deprivation in obesity
Conclusion
• Open-data alone is not enough
• Social e-infrastructure for science is needed
• Sharing insights and methods is key, and can be achieved through systems like MethodBox + ESDS
top related