research methods · research methods in a new era great opportunities for utilizing big data, high...

15
Research Methods in a Big Data and Cognitive Era Dr. Alex Liu RMDS Pasadena, CA, USA www.ResearchMethods.org Updated October 8, 2015

Upload: others

Post on 03-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Methods · Research Methods in a new era Great opportunities for utilizing big data, high speed computing power and huge selection of analytical tools (models and algorithms)

Research Methods in a Big Data and Cognitive Era

Dr. Alex Liu

RMDS

Pasadena, CA, USA

www.ResearchMethods.org

Updated October 8, 2015

Page 2: Research Methods · Research Methods in a new era Great opportunities for utilizing big data, high speed computing power and huge selection of analytical tools (models and algorithms)

Research Process

Formulate a

Question

Select an Appropriate

Research Design

Collect & Analyze

Data

Interpret

Findings

Publish

Findings

Review the Available

Literature

Page 3: Research Methods · Research Methods in a new era Great opportunities for utilizing big data, high speed computing power and huge selection of analytical tools (models and algorithms)

RMS

Research Methods are about optimal RM4Es workflows

Data

Sources Data

Storage Data

Cleaning

Feature

Extraction

MODELS

Regression

Decision

Tree

Bayesian & Causality

Time Series

ALGORITHMS &

COMPUTING

MLE

ITERATIVE (MapReduce

& Spark)

R

SPSS

STATISTICS &

Visualization

RMSE

Confusion

Matrix

ROC Curve

Business Acumen

Subject

Knowledge

Communication

Data Evaluation Explanation Estimation Equation

Page 4: Research Methods · Research Methods in a new era Great opportunities for utilizing big data, high speed computing power and huge selection of analytical tools (models and algorithms)

Older Gen Research

• Literature Review in Library now Google

• Data in Excel Sheets

• Proprietary Computing with a Nicely

Integrated Package – Stata, SPSS, Mathematica

4

Page 5: Research Methods · Research Methods in a new era Great opportunities for utilizing big data, high speed computing power and huge selection of analytical tools (models and algorithms)

New Gen of Research

• Open Source Computing Languages • R, Python, Scala, Julia

• Open Source Tools for Processing &

Organizing Data and Analytics – Notebooks: Jupyter, Zeppelin

– Visualization: D3.js, ggplot

– IDE: R studio

– Data Prep: Open Refine

• Open Source Execution Environments – Spark, Hadoop

5

Page 6: Research Methods · Research Methods in a new era Great opportunities for utilizing big data, high speed computing power and huge selection of analytical tools (models and algorithms)

We live in a moment of accelerated transformation

of total workflows

will be in the cloud by 2016

62% Devices

connected to the

internet by 2020

75B of the world’s data created in the last two

years

90%

Page 7: Research Methods · Research Methods in a new era Great opportunities for utilizing big data, high speed computing power and huge selection of analytical tools (models and algorithms)

Big Data Era – Too Much Data to Use

Page 8: Research Methods · Research Methods in a new era Great opportunities for utilizing big data, high speed computing power and huge selection of analytical tools (models and algorithms)

Too Many Analytical Steps Research Flows Difficult to Manage

Page 9: Research Methods · Research Methods in a new era Great opportunities for utilizing big data, high speed computing power and huge selection of analytical tools (models and algorithms)

Too Much Resources to Coordinate

GridOperations

simulation data

discovery

ScienceReview

Data Grid

storageelement

replica locationservice

storageelement

storageelement

Data

Tra

nsp

ort S

tora

ge

Reso

urc

eM

gm

t

virtualdata

catalog

virtual dataindex

virtual

datacatalog

virtualdata

catalog

Computing Grid

workflowplanner

request plannerworkflowexecutor

(DAGman)

request executor(Condor-G,

GRAM)

requestpredictor

(Prophesy)

Grid Monitor

ProductionManager

Researcher

planning

discovery

co

mp

ositio

n

sim

ula

tio

n

an

aly

sis

sharing

raw d

ata

detector

derivatio

n

Page 10: Research Methods · Research Methods in a new era Great opportunities for utilizing big data, high speed computing power and huge selection of analytical tools (models and algorithms)

And a lot more to care …

• Safeguard Research Assets

– Control access

– Timely tracking

– Knowledge management

• Regulatory compliance

– Book keeping

– Versioning

– Time recording

Page 11: Research Methods · Research Methods in a new era Great opportunities for utilizing big data, high speed computing power and huge selection of analytical tools (models and algorithms)

Challenges for Research Methods

• Too much data to import

• Too much data cleaning to complete

• Too many analytical methods to select

• Too many algorithms to select

• Too many computing tools to select

• Too many IT systems to select

Page 12: Research Methods · Research Methods in a new era Great opportunities for utilizing big data, high speed computing power and huge selection of analytical tools (models and algorithms)

Many new methods are coming

Research Support

Structures the data to answer that question

IT

Delivers a platform to enable creative discovery

Researchers

Explores what questions could be asked

Researchers

Determine what question to ask

Monthly research reports

Profitability analysis

Customer surveys

Brand sentiment

Product strategy

Maximum asset utilization

Big Data Method

Iterative Analysis

Traditional Method Structured Analysis

Page 13: Research Methods · Research Methods in a new era Great opportunities for utilizing big data, high speed computing power and huge selection of analytical tools (models and algorithms)

Research Methods in a new era

• Great opportunities for utilizing big data, high speed computing power and huge selection of analytical tools (models and

algorithms)

• One researcher alone may not be able to solve all the problems faced

• Some intelligent assistance is needed to help every researcher

Page 14: Research Methods · Research Methods in a new era Great opportunities for utilizing big data, high speed computing power and huge selection of analytical tools (models and algorithms)

Intelligently Managing RFs Helps

• Replication (Provenance)

• Knowledge re-use & sharing

• Readiness for auditing

• Readiness for automation

• Removes much of the mundane data management burden, freeing scientists to do science

Replicability is the foundation of scientific research.

RF management facilitates replicability.

Research Flow (RF)

Page 15: Research Methods · Research Methods in a new era Great opportunities for utilizing big data, high speed computing power and huge selection of analytical tools (models and algorithms)

Need AI to automate and augment

• AI to automate some research flows

• AI to augment all researchers