a data scientist perspective on data curation in the digital era

Post on 05-Nov-2014

214 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Perspective from a marine geoscientist turned data scientist on career opportunities and educational requirements in the "Era of Big Data"

TRANSCRIPT

Symposium on Digital Curation in the Era of Big Data: Career Opportunities and Educational Requirements:

A Data Scientist Perspective

Dr. Vicki Lynn FerriniLamont-Doherty Earth Observatory

Background (What I do)• Data Documentation (Metadata)• Data Management• Data Discovery & Access Tools• Develop/Implement QA/QC• Data Syntheses• Data Compliance Tools• Education Materials• Delivery to National Data Centers, Libraries• Data Publication & Links to Scientific Literature• Data Integration, Visualization & Analysis Tools• Best Practice Guidelines for Optimizing Acquisition

“Support, sustain, and advance the geosciences by providing data services for observational solid earth data from the Ocean, Earth, and Polar Sciences.”

rvdata.us

Scientific Data Continuum

Data Providers

Data Consumers

Data Producers

Scientific Literature

Data Consumers

Data Producers

Scientific Literature

NOW

THEN

Varying Goals/Perspectives/Needs

Perspective of Data Producers• Goal: Scientific Discovery• Data Acquisition&

Reduction• Data Assembly• Visualization, Integration

& Interpretation• Scientific Standards• Technical & Operational

Limitations• Data documentation• Varies by domain• Often difficult• HeterogeneousDomain Specialists

Perspective of Data Consumers

• Goal: Discovery• Data Discoverability & Access• Cross-disciplinary• Scientific Standards• Interpretation• Increased importance of

documentation• Data not self-generated• Data Quality/Reliability• Data Use/Misuse

Domain Specialists & Public

Perspective of Data Providers• Goal: Access/Preservation/Re-Use• Data Formats & Standards• Data Documentation &

Preservation Techniques• Scientific & Metadata Standards• Data Citation• Data Transfer Mechanisms• System Usability• Interoperability/Linked Data• Needs of Diversity of User

Community• Knowledge of Content

Human & Digital Bridge between Producers & Consumers

At the Intersection:The Data Scientist

Data Producers

Data Providers

Data Consumers

Data Scientist

DATA PRODUCERS

DATA PROVIDERS

DATA CONSUMERS

Data Stewardship Continuum

Key Attributes of Data Scientists

• Knowledge spanning full scientific data stewardship continuum

• Domain Experience• Content & applications• Data acquisition & reduction practices• Nuances of Data

• Technical knowledge• Evolving Technologies• Data Acquisition & Management• Metadata

Key Attributes of Data Scientists

• Other skills (seldom taught)• Communication & Organization• Understand cultural aspects of user

community• People/Project Management• Balance between micro- and macro-

perspectives

Key Attributes Tech Team Members• Basic knowledge of content OR interest/curiosity• Experience with Data Production/Consumption• Technical skills:– web development & technology– geospatially enabled data management tools– experience with data analysis tools – ability to work in a variety of tech environments

• Complementary skill sets• Innovation & creativity• Willingness to ask questions – assumptions can be

dangerous

Challenges & Opportunities• Difficult to find right balance between technical

skills and interest in content– Team dynamics, management approaches evolving– Increasing opportunities to engage/educate computer

scientists in domain science• Data producers are slow to join the digital era– Educational opportunities– Scientific benefits continue to grow – New generation incorporating data sharing into scientific

workflow• Difficult to keep pace with evolving technologies– Educational & Professional Development opportunities

The Future?

Data Producers

Data Providers

Data Consumers

Data Scientists

top related