challenges and opportunities in big data analytics · challenges and opportunities in big data...

17
Dr. Prasad A. Naik, Professor, UC Davis Challenges and Opportunities in Big Data Analytics @iValleyIC #FinTechTalk http://ivalley.co http://gsm.ucdavis.edu/faculty/prasad-naik

Upload: dinhhuong

Post on 01-May-2018

224 views

Category:

Documents


1 download

TRANSCRIPT

Dr. Prasad A. Naik, Professor, UC Davis

Challenges and Opportunities in Big Data Analytics

@iValleyIC #FinTechTalk

http://ivalley.co

http://gsm.ucdavis.edu/faculty/prasad-naik

Volume Variety VelocityBig

Data

N

p

p exceeds N

N

p

Data Matrix Grows

Theoretically “Big Data” means …

Standard Theory

• Sample Size N Infinity

• Number of variables p fixed

• Ratio p/N becomes negligible

• Result?

– Tall data matrix

– p < N

Big Data Theory

• Sample size N Infinity

• Variables p Infinity at a faster rate than N does

• Ratio p/N remains “Big”

• Result?

– Long data matrix

– p > N

Big Data?

Standard Data

• Data Matrix (Tall)

• Large N, Smaller p

– p < N

Big Data

• Data Matrix (Long)

• Large N, but Larger p

– p > N

Got Big Data,But where are my Big Insights?

Two Challenges

• When N Large, but p < N

o Computational challenges

• Storage, retrieval

• parallel computing, real-time analysis

• When N Large, but p > N

o Statistical challenges

• All standard methods break down!

Need New Analytics for Big Data

Linear Regression

Logistic Regression

Principal Components

Factor Analysis

Don’t work when p > N

Opportunity: Sparsity constraints pave the way

Standard Techniques

Big Data Analytics

Sparse Analytics for Big Data

Linear Regression

Logistic Regression

Principal Components

Don’t work when p > N

Lasso Regression

Sparse Logistic

Sparse PCA

Works even when p > N

How to instill sparsity? Many ways …

Elastic Net Penalty

Lasso Penalty

Two Marketing Applications

• What drives charisma of CEOs and Founders?

• What drives liking for Super Bowl Ads?

Impact of Nonverbal Communication on Charisma of CEOs/Founders

N = 22 sales pitches 1-minute long

p = 100+ variables

Mine the gestures

Shoot videos

p/N ratio = 5X

Takeaways

• Big Data needs Sparse Analytics

o It’s not the size -- it’s the relative size

• When p/N < 1, usual statistical tools work

• When p/N > 1, sparsity needs to be incorporated

• Many ways to incorporate sparsity

o Lasso, Elastic Net

o Depends on the goals of the project

• These methods work in finance too, not just marketing!

UC Davis Launches Master of Science in Business Analytics

MSBA Program

• Starts in Fall 2017

• 10-month or 19-month

• Equal emphasis on hard and soft skills

– Hard: Data + Analytics

– Soft: Business + Practicum

You can help!

• Encourage your smart employees to enroll

• Contribute real data-driven projects for student-teams to tackle

• Be a guest speaker or donor

http://gsm.ucdavis.edu/msba-masters-science-business-analytics

Questions?

Contact me [email protected]