e-Infrastructure for Social Science data:
Obesity e-Lab & MethodBoxIan Dunlop
15/03/11 [email protected]
Terminology
• Obesity e-Lab is the ESRC project
www.obesityelab.org.uk
• MethodBox is the product
www.methodbox.org
Obesity e-Lab Aims
• Enable socially networked research between the social sciences, health sciences and public health
• Add value to archived datasets by developing technologies to help on-line users
• Seed an “open source” approach to social research publication
Project Objectives
• Engagement (‘More with less’)– Research communities (Obesity/Cancer, Education) – Public health researchers (Academic, NHS, LA)– Key data providers (ESDS/UKDA)
• Reduce barriers– For survey datasets– Formation of research communities (cross-disciplinary)
• Develop tools– On line digital laboratory an ‘e-Lab’ known as MethodBox
• Data * Methods * People
e-Lab
Socially-stimulating science, in-silico
Research Object
FindShareReuse
Data-sources
Data-preparation scripts
Research protocol Statistical analysis scripts
Slides
Working datasets
Figures/Graphics
Manuscripts
References
Analysis-logs & notes
Where we are upto
• MethodBox launched at ESDS government event April 2010(scored 5.7/7 from 15 responses)
• 80 registered users, 45 scripts and 58 data extracts.
• 21 public health researchers trained using a combination of social science and health science approaches
• Methodological approach adopted by North West e-Health (www.nweh.org.uk) project (which is 20x bigger than us)
Context, Features, Architecture
• Context– Investigation Cycle– Survey (Meta) Data overload– How MethodBox fits it
• MethodBox– Architecture– Screenshots– E-Infrastructure
• Future Directions
Investigation Cycle
Data
•Our Tooling focus is (survey) Data and Analysis•Out main Community focus is Expertise via Methods/Analysis/Scripts
AnalysisModels
Results
QuestionsQuestions Publications, Reports or Decisions
ToolingCommunity
Examples: HSE 2006
13 pages208 pages
Variable DefinitionsVariable CategoriesVariable SPSS code
Questionnaire Instructions
224 pages
Questions usedTo set variables
148 pages
Survey Description
9 pagesVariable Value
Domains 351 pages46 MB data files
Data and Variable Codebook
X 17 All HSE
@1800 Variables
How MethodBox fits in
UK Data Archive(UKDA)
MethodBox
Economic and Social Data Service(ESDS)
Survey Curation
Survey Mapping
diagram not to scale
Survey Navigation
Survey Commissioning & Collection
etc…
Impr
ovin
g Ac
cess
& U
se
Ruby delayed jobRuby delayed job
Ruby on RailsRuby on Rails
Data providers
Data providers
User Dataset import
User Dataset import
File system
File system
mySQLmySQLMetadata
importMetadata
import
User data and
metadata import
Request ‘catalog’
information
Provide metadata
Search
Results
Variable info with Stats
Profiles
People & Expertise
Methods
Method Information
Data Extracts
Making the data extractvisible…
Linking a data extractwith a script forderiving variables…
Sharing and visibility
MethodBox as e-Infrastructure• Data Providers
– Existing infrastructure (NESSTAR/NESSTAR Server)– Cautious
• adopt only ‘proven’ technologies• Willing ‘try’ things if risk/work is low
• MethodBox offers– Social Layer, sharing, data tooling– Integration
• Existing data provider infrastructure – NESSTAR Server • Security infrastructure (Shibboleth)• Automated running of scripts for new datasets (using institutional/national
compute)• Deployment
– ESDS/CCSR first instance (exit strategy)• Obesity e-Lab project ends 31/03/12
Future work• MethodBox as e-Infrastructure
– Target deployment as part of ESDS/CCSR– Integration with NESSTAR system
• Focus on communities– Greater Manchester Public Health Inequalities
Research Network– University of Manchester School of Education– North West e-Health and Arthritis Research UK
• Ability to ‘run’ methods– Part funded by Obesity e-lab work in JISC ‘National e-
Infrastructure for Social Simulation’ project
video at http://bit.ly/methodbox11