managing data responsibly to enable research interity

53
Managing data responsibly to enable research integrity Heather Coates | Digital Scholarship & Data Management Librarian http://ulib.iupui.edu/digitalscholarship/datasupport / Introduction to Research Ethics Quaid G504 (Fall 2016)

Upload: heather-coates

Post on 22-Jan-2018

278 views

Category:

Health & Medicine


2 download

TRANSCRIPT

Page 1: Managing data responsibly to enable research interity

Managing data responsibly to enable research integrity

Heather Coates | Digital Scholarship & Data Management Librarianhttp://ulib.iupui.edu/digitalscholarship/datasupport/

Introduction to Research EthicsQuaid G504 (Fall 2016)

Page 2: Managing data responsibly to enable research interity

What is research integrity?

Page 3: Managing data responsibly to enable research interity

security

privacy

trust

honesty

accuracy

efficiency

objectivity

personal responsibility

ownership

stewardship

governance

Page 4: Managing data responsibly to enable research interity

Why does RDM matter?

RDM as a component of RCR

Roles & Responsibilities

Practical RDM

Page 5: Managing data responsibly to enable research interity

WHY DOES RDM MATTER?

Page 6: Managing data responsibly to enable research interity

The value of data increases with their use.

-Paul Uhlir

Page 7: Managing data responsibly to enable research interity

Source: John Gantz, IDC Corporation: The Expanding Digital Universe

0

100,000

200,000

300,000

400,000

500,000

600,000

700,000

800,000

900,000

1,000,000

2005 2006 2007 2008 2009 2010

The World of Data Around Us

Transient information

or unfilled demand for

storage

Information

Available Storage

Pe

tab

yte

s W

orld

wid

e

Page 8: Managing data responsibly to enable research interity

• Natural disaster • Facilities infrastructure failure • Storage failure • Server hardware/software failure• Application software failure• External dependencies (e.g. PKI failure)• Format obsolescence• Legal encumbrance • Human error• Malicious attack by human or automated agents• Loss of staffing competencies• Loss of institutional commitment • Loss of financial stability • Changes in user expectations and requirements

The World of Data Around Us: Data Loss

CC

im

age b

y S

hary

n M

orr

ow

on F

lickr

CC

im

age b

y m

om

bole

um

on F

lickr

Page 10: Managing data responsibly to enable research interity

Poor Data Management Affects Everyone

“MEDICARE PAYMENT ERRORS NEAR $20B” (CNN) December 2004

Miscoding and Billing Errors from Doctors and Hospitals totaled $20,000,000,000 in FY2003 (9.3% error rate) . The error rate measured claims that were paid despite being medically unnecessary, inadequately documented or improperly coded. In some instances, Medicare asked health care providers for medical records to back up their claims and got no response. The survey did not document instances of alleged fraud. This error rate actually was an improvement over the previous fiscal year (9.8% error rate).

“AUDIT: JUSTICE STATS ON ANTI-TERROR CASES FLAWED” (AP) February 2007

The Justice Department Inspector General found only two sets of data out of 26 concerning terrorism attacks were accurate. The Justice Department uses these statistics to argue for their budget. The Inspector General said the data “appear to be the result of decentralized and haphazard methods of collections … and do not appear to be intentional.”

“OOPS! TECH ERROR WIPES OUT Alaska Info” (AP) March 2007

A technician managed to delete the data and backup for the $38 billion Alaska oil revenue fund – money received by residents of the State. Correcting the errors cost the State an additional $220,700 (which of course was taken off the receipts to Alaska residents.)

Page 11: Managing data responsibly to enable research interity
Page 13: Managing data responsibly to enable research interity

"Good data management practice allows reliable verification of results and permits new and innovative research built on existing information. This is important if the full value of public investment in research is to be realized."

Managing and Sharing Data: Best Practices for ResearchersUK Data Archive

Page 14: Managing data responsibly to enable research interity
Page 15: Managing data responsibly to enable research interity

Benefits: Good Data Practices & Open Data

• Open data addresses social justice issues

• Open data enhances social welfare

• Open data benefits for effective governance and policy making

• Open data grows the economy

• Open data improves the integrity of the scholarly record

• Open data facilitates the education and training of new generations

• Open data enables validation or replication to support published results

• Open data accelerates the pace of discovery

• GDP increases the impact of your work by sharing your data, code & other products

• GDP improves the quality and consistency of research data you produce (save $$$)

• GDP improves the efficiency of your research (save time)

Page 16: Managing data responsibly to enable research interity

Personal Experience

“Please forgive my paranoia about protocols, standards, and data review. I'm in the latter stages of a long career with USGS (30 years, and counting), and have experienced much. Experience is the knowledge you get just after you needed it.

Several times, I've seen colleagues called to court in order to testify about conditions they have observed. Without a strong tradition of constant review and approval of basic data, they would've been in deep trouble under cross-examination. Instead, they were able to produce field notes, data approval records, and the like, to back up their testimony.

It's one thing to be questioned by a college student who is working on a project for school. It's another entirely to be grilled by an attorney under oath with the media present.”

- Nelson Williams, Scientist US Geological Survey

Page 17: Managing data responsibly to enable research interity

RDM AS A COMPONENT OF RCR

Page 19: Managing data responsibly to enable research interity

Concepts of Data Management

• Data ownership

• Data collection

• Data storage

• Data protection

• Data retention

• Data analysis

• Data sharing

• Data reporting

Steneck, 2004

Page 20: Managing data responsibly to enable research interity

DataONEData Life Cycle

Page 21: Managing data responsibly to enable research interity

The purpose of data management planning is to ensure that research data produced by a project are high quality, well organized, thoroughly documented, preserved, and accessible so that the validity of the data can be determined at any time.

ORI Guidelines for Responsible Data Management

The goal of data management is to produce self-describing data sets.

DataONE Primer on Data Management

Page 22: Managing data responsibly to enable research interity

Why is good data management so challenging?

Page 23: Managing data responsibly to enable research interity

ambiguity effect

availability heuristic

confirmation bias

experimenter’s or expectation bias

framing effect

hindsight bias

neglect of probability

optimism bias

planning fallacywell traveled road effect

Page 24: Managing data responsibly to enable research interity

ROLES & RESPONSIBILITIES

Page 25: Managing data responsibly to enable research interity
Page 26: Managing data responsibly to enable research interity

Funder progress towards openness

1985: National

Research Council

1999: Office of Mgmt & Budget, Circular A-110

revisions

2003: NIH Data

Sharing Policy

2008: NIH Public Access Policy

2011: NSF DMP Require-

ment

2012: NEH,

Office of Digital

Humanities DMP

Require-ment

2013: NSF Bio sketch change

2013: OSTP

Memo on Public

Access to Results of Federally-Funded

Research

2014: OSTP Memo on

Improving the Mgmt of & Access to Scientific

Collections

2014: OMB

Circular A-81

(Uniform Guidance)

takes effect

2015: Federal Funding agencies release plans

responding to 2013 OSTP memo

2016: Federal Funding Agency

Plans take effect

(DMP req)

Page 27: Managing data responsibly to enable research interity

Funder Policies: DMP & data sharing

• Association for Healthcare Research & Quality

• Centers for Disease Control & Prevention

• National Institutes of Health

• National Science Foundation

More agency policies at datasharing.sparcopen.org

Page 28: Managing data responsibly to enable research interity

Publisher Policies: Data availability

• DataDryad Publishers

• PLoS Journals

• Nature Publishing Group

• American Economic Review

• BioMedCentral

• JORD - Social Science Journals with a research data policy

• Data policies of Economic Journals

https://ulib.iupui.edu/digitalscholarship/datasupport/publisher_policies

Page 29: Managing data responsibly to enable research interity

Institutional Policies

• Vary greatly, with lots of gaps

• Distributed – address specific local or state requirements for specific types of data

• Often focus on institutional data rather than research data

• Do not provide practical guidance

• Do not distinguish between institutional and personal responsibilities

Page 30: Managing data responsibly to enable research interity

Roles/Responsibilities of project personnel

• On your own

– Fill in the team members responsible for key data activities

• In small groups (2-3 people), share and discuss

– What kind of training is provided for team members to complete these tasks accurately?

• Whole group discussion

– What barriers do you face in tracking roles and responsibilities?

– What barriers do you face in providing training?

Page 31: Managing data responsibly to enable research interity

Team Member Name Project Role Activity Description

Project design [+ documentation]

Determining the aims of the project, the methods used to achieve those aims, and identifying the products resulting from the project.

Translating the aims of the project into measurable research questions or hypotheses.

Instrument/measure/data collection tool design [+ documentation]

Creating tools that adapt the research questions or hypotheses into questions that can be addressed by discrete data points.

Validating tools through external review or pilot testing.Data collection [+ documentation]

Conducting surveys, interviews, experiments, and other project procedures according to the protocol in order to generate data.

Data processing [+ documentation]: entry, proofing/cleaning, preparation for analysis

Entering analog data into spreadsheet or database. Documenting procedures, date, and person responsible.

Checking data entry for accuracy and completeness. Documenting procedures, date, and person responsible.

Checking data for missing data, errors, and outliers. Documenting procedures, date, and person responsible.

Deciding what data to include/exclude. Documenting decision-making process and criteria used.Data analysis [+ documentation]:

Selecting analytical tools to be used. Documenting decision-making process and criteria used.

Conducting data characterization and screening tests, running analyses, generating results. Documenting process and files generated.

Deciding what data are relevant to the project aims and objectives. Documenting decision-making process and criteria used.

Data reporting: Creating summary tables, graphs, and other visuals to represent the data.

Writing up the project details and relevant results in the packages/format requested by the client, as specified by the deliverables agreed upon in the contract.

Page 32: Managing data responsibly to enable research interity

PRACTICAL RDM

Page 33: Managing data responsibly to enable research interity

Basic Principles: Good Research Data Practices

1. Have a plan & use it2. Follow the 3-2-1 rule of data storage3. Document4. Be consistent; when you aren’t, document the deviation5. Use common, standardized terminology6. Monitor the quality of the data as it is being created7. Report enough detail about your research so that others in your

field can reproduce it and others outside your field can evaluate it8. Be as open as possible9. Think about how your research might be useful to others

Page 34: Managing data responsibly to enable research interity

1: Functional data management plans support teams

• A tool for planning all the key activities related to data before you have a messy pile of bits on your hands

• A working document that reflects how a study is conducted

• Communication device for the team

• Documents the team members and their roles

• Customized to address the issues most relevant to your research

Page 35: Managing data responsibly to enable research interity

1: Planning…learning from Good Clinical Data Management Practices

• Begin with the end in mind OR Produce report-ready outputs

• Plan, test, revise, plan, test, revise…implement

• Include all stakeholders in the design of the protocol, data collection tools, data management plan, etc.

• Document, document, document– Specify documents required for reproducible research

– Facilitates clear communication and shared understanding throughout the project

– Specify roles and responsibilities from the beginning

Page 36: Managing data responsibly to enable research interity

2: Follow the 3-2-1 Rule

The accepted rule for backup best practices is the three-two-one rule. It can be summarized as: if you’re backing something up, you should have:

• At least three copies (in different places),

• in two different formats,

• with one of those copies off-site.

Page 37: Managing data responsibly to enable research interity

3: Document: How much?

More than you think you will need BUT less than everything

Page 38: Managing data responsibly to enable research interity

Information EntropyD

ATA

DE

TA

ILS

Time of data development

Specific details about problems with individual items or

specific dates are lost relatively rapidly

General details about datasets are lost through time

Accident or

technology

change may

make data

unusable

Retirement or career change makes access to

“mental storage” difficult or unlikely

Loss of data developer leads

to loss of remaining

information

TIME(From Michener et al 1997)

Page 39: Managing data responsibly to enable research interity

3: Document, document, document

Documentation should capture crucial details needed for post publication peer review or validation of results

• Study: research questions/aims, IRB protocol, informed consents/authorizations, etc.

• Data collection instruments or tools OR data sources

• Data collection process or workflow

• Can take many forms, but should be consistent with standards or norms of practice for your field (e.g., data dictionary, data model, codebook, readme.txt, lab notebook)

Page 40: Managing data responsibly to enable research interity

3A: Know what you have - Data Inventory

• On your own

– Fill in as much of the data inventory as you can

• In small groups (2-3 people), share and discuss

– Benefits of knowing exactly what data you have?

– How hard would it be to complete this fully and accurately?

• Whole group discussion

– How might this be helpful throughout various phases of the project?

– How might it be helpful to have an inventory for complete and active projects?

Page 41: Managing data responsibly to enable research interity

3A: Know what you have - Data Inventory Example

• Funding source

• Program or initiative

• Project title

• PI First Name

• PI Surname

• Other Researchers/Data Contacts

• Project Start Date dd-mm-yyyy

• Project End Date dd-mm-yyyy

• New datasets created?

• How many datasets created

• Data location(s)

• Dataset Type (qualitative, quantitative, mixed methods, model)

• Sharing data?– Deposit location?

– Licensing?

– Embargo?

http://www.data-archive.ac.uk/create-manage/strategies-for-centres/data-inventory

Page 42: Managing data responsibly to enable research interity

3B: Documentation Strategies

• Lab notebooks (print or electronic)

• Codebooks

• Data Dictionaries

• Procedures Manuals

• Protocols

• Readme.txt

Page 43: Managing data responsibly to enable research interity

3C: Structured documentation [metadata] is crucial for discovery, reuse, and interoperability

• Metadata describes the who, what, when, where, how, why of the data

• Metadata = documentation for machines (standardized, structured)

• Purpose is to enable evaluation, discovery, organization, management, re-use, authority/identification, and preservation

• Standards are commonly agreed upon terms and definitions in a structured format

• Good documentation builds trust in your data – provenance, data integrity, transparency, audit trail, etc.

Page 44: Managing data responsibly to enable research interity

4: Be consistent

• We’re human – recognize the challenge

• Prevention – design research instruments & processes to prevent mistakes

• Pilot everything to identify potential problem areas

• When you aren’t, document the deviation

• Train your project personnel to be consistent & monitor performance

• Do internal audits, quality checks, data screening periodically to detect inconsistencies

Page 45: Managing data responsibly to enable research interity

5: Use common, standardized terminology

• For things/concepts

– Diagnoses

– Species/cell lines

– Locations

– Variable names

– Samples & materials

• For formats, too

– Dates

– Codes

– Identifiers

Page 46: Managing data responsibly to enable research interity

6: Monitor the quality of the data

• Don’t wait until data collection is over

• Quality Assurance

• Quality Control

• Build it into the project timeline

• Make it someone’s job

• Document what you find and how it was corrected

Page 47: Managing data responsibly to enable research interity

7: Better reporting

• Report enough detail about your research so that others in your field can reproduce it and others outside your field can evaluate it

• This includes ALL aspects of the study: study design, data collection methods, sampling, population, data screening & processing, QA/QC procedures, analytical procedures, visualization procedures, etc.

I know you can’t fit this into a journal article but you can write up pieces of this as the study is conducted to support publications and reporting to the funder. Plus it makes writing those products much easier.

Page 48: Managing data responsibly to enable research interity

8: Be as open as possible (Open Science)

Open isn’t an all or nothing choice• Study registration• Open notebook science• Data sharing (raw data, processed data, data supporting published

results)• Open Data• Open Access publishing (deposit pre/post print in a repository,

choose OA journal, choose Gold OA option)

Want to learn more? Center for Open Science Why Open Research?

Page 49: Managing data responsibly to enable research interity

9: Think ahead

How might your research might be useful to others? To yourself in 5/15/50 years? Your students or trainees? Historians?

• Consider what you will forget in that time and document it

• Consider whether your data will be useful beyond the life of the project. If so, put it somewhere safe like an institutional or subject repository to share it and ensure long-term access.

Page 50: Managing data responsibly to enable research interity
Page 51: Managing data responsibly to enable research interity

Stakeholders in (Academic) RDM

• Research Administration

• Research Compliance

• University IT

• University Libraries

• University Archives

• Consortia (e.g., CIC)

• NIH CTSA Hubs/NCATS

• Research & Technology Corporation: http://iurtc.iu.edu/

Page 52: Managing data responsibly to enable research interity

Case studies: Discussion

1. http://retractionwatch.com/2015/11/05/got-the-blues-you-can-still-see-blue-after-all-paper-on-sadness-and-color-perception-retracted/

2. https://ori.hhs.gov/content/case-summary-anderson-david

Page 53: Managing data responsibly to enable research interity

Resources1. Uhlir, P. F. (2010). Information Gulags, Intellectual Straightjackets, and Memory Holes. Data Science Journal, 9, ES1-ES5.

2. DataONE Education Module: Data Management. DataONE. Retrieved December 2013. From

http://www.dataone.org/sites/all/documents/L01_DataManagement.pptx

3. Scientists are hoarding data and it’s ruining medical research: http://www.buzzfeed.com/bengoldacre/deworming-trials

4. Losing data from the National Centre for E-Social Science (NCESS) Portal:

http://datastories.jiscinvolve.org/wp/2015/08/10/losing-data-from-the-national-centre-for-e-social-science-ncess-

portal/

5. Biomedical data sharing and reuse: Attitudes and practices of clinical and scientific research staff:

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0129506

6. Over half of psychology studies fail reproducibility test: http://www.nature.com/news/over-half-of-psychology-

studies-fail-reproducibility-test-1.18248

7. Value of Open Data Sharing: https://www.fosteropenscience.eu/sites/default/files/pdf/2536.pdf

8. Michener, W. K., Brunt, J. W., Helly, J. J., Kirchner, T. B., & Stafford, S. G. (1997). Nongeospatial metadata for the

ecological sciences. Ecological Applications, 7(1), 330-342.

9. Society for Clinical Data Management. (2013). Good Clinical Data Management Practices. Washington, D.C.

10. UK Data Archive. (2015). Prepare and manage data. From http://ukdataservice.ac.uk/manage-data.