ie598 big data optimizationniaohe.ise.illinois.edu/ie598/ie598-lecture1-introduction.pdf · starts...

33
IE598 Big Data Optimization Instructor: Niao He Jan 17, 2018 Introduction 1

Upload: others

Post on 26-Mar-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

IE598 Big Data Optimization

Instructor: Niao He

Jan 17, 2018

Introduction

1

A little about me

• Assistant Professor, ISE & CSL

UIUC, 2016 –

• Ph.D. in Operations Research,

M.S. in Computational Sci. & Eng.

Georgia Tech, 2010 – 2015

• B.S. in Mathematics,

University of Sci. & Tech. of China, 2006 – 2010

2

A little about the course

Big Data Optimization

• Explore modern optimization theories, algorithms, and big data applications

• Emphasize a deep understanding of structure of optimization problems and computation complexity of numerical algorithms

• Expose to the frontier of research in the intersection of large-scale optimization and machine learning

3

A little about you

• PhD or Master?

• ISE, ECE, Stat, CS?

• Took any optimization courses?

4

Course Details

• Prerequisites: no formal ones, but assume knowledge in

– linear algebra, real analysis, and probability theory

– basic machine learning and optimization at graduate level

• Textbooks: no required ones, but recommend to read the listed references on syllabus

5Ben-Tal & Nemirovski (2011) Nesterov (2003) Beck (2017) Bubeck (2015)

Course Details

• Evaluation: groups of size 1~3

– Paper Presentation (25%)

– Final Project (75%)

• Proposal (10%) : 1~3 pages

• Report (40%): 10~15 pages under given format

• Presentation (25%): 15~20 mins

– Bonus (20 pts): a conference or journal submission

• Deadlines: see syllabus

6

Course Admin

• Syllabus & Websitehttp://niaohe.ise.illinois.edu/IE598/ (pwd: spring2018)

• Where to get help – No TAs

– Email: [email protected] with [IE 598] in your subject

– Office Location: 211 Transport Building

– Office Hours: Mon. 3:00-4:00 or by appointment via email

7

Introduction

8

Starts with the buzzword

9

Era of Big Data

• Big data heat in academia

10

Era of Big Data

• Big data heat in industry– LinkedIn:

48,000+ Data Scientist jobs in United States

11

More than a buzzword

• Big data revolution in various areas

– Robotics and autonomous car

– Natural language processing

– Computer vision

– Healthcare

FinanceHealthcare

Environment

Aerospace

Lifestyle

12

How to do data analysis?

Key Steps

• Pose a problem

• Collect data

• Pre-process and clean data

• Formulate a mathematical model

• Find a solution

• Evaluate and interpret the results

13

What is Optimization?

• Find the optimal solution that minimize/maximize an objective function subject to constraints

14

Why do we care?

Optimization lies at the heart of many fields, especially machine learning.

• Finance– Portfolio selection, asset pricing, etc.

• Electrical Engineering – Signal and image processing, control and robotics, etc.

• Industrial Engineering– Supply chain, revenue management, transportation etc.

• Computer Science– Machine learning, computer vision, etc.

15

Example – Portfolio Selection

• Markowitz Mean-Variance Model

where

– 𝑤 is a vector of portfolio weights

– 𝑅 is the expected returns

– Σ is the variance of portfolio returns

– 𝜆 > 0 is the risk tolerance factor

16

Example – Image Denoising

• Total Variation Denoising Model

where

– 𝑥 is image matrix

– 𝑂 is the noisy image, P is the observed entries

– 𝑇𝑉(𝑥) is the total variation

17

Example – Inventory

• Newsvendor Model

where

– 𝑞 is number of newspaper to be stocked

– 𝐷 is the random demand

– 𝑐 is the unit purchase price

– 𝑝 is the sell price

18

Example – Regression

• Linear Regression Model

where

– 𝑥𝑖: predictor vector (feature)

– 𝑦𝑖: response vector (label)

– 𝑤: parameters to be learned

– 𝑛: number of data points

19

Example – Regularization

• Ridge Regression Model

▪ 𝑤2

2= σ𝑗=1

𝑑 𝑤𝑗2 is the 𝐿2-regularization

• LASSO (Least Absolute Shrinkage and Selection Operator)

▪ 𝑤1= σ𝑗=1

𝑑 |𝑤𝑗| is the 𝐿1-regularization

20

Example – Classification

• Maximum Margin Classifier Model

where

– 𝑥𝑖: predictor vector (feature)

– 𝑦𝑖 ∈ {1,−1}: label/class

– 𝑤: parameters to be learned

– 𝑛: number of data points

21

Example – More Classification

• Soft Margin SVM (support vector machine)

• Logistic Regression

22

Example – Maximum Likelihood Estimation

• Assume data points 𝑥1, . . . , 𝑥𝑛 are drawn i.i.d. from some distribution and we want to fit the data with a model 𝑝(𝑥|𝑤)with parameter 𝑤, the maximum likelihood estimation is to solve

– Least square regression as a special case

– Logistic regression as a special case

23

Example – Clustering

• K-Means Model

where

– 𝑥1, … , 𝑥𝑛: data

– 𝜇1, … , 𝜇𝑘: cluster centers to be learned

– 𝐶1, … , 𝐶𝑘: clusters to be assigned to

24

Many More Examples in ML

• Supervised learning (predictive models)– Regression

– Classification

– Neural networks

– Boosting

• Unsupervised learning (data exploration)– Clustering (K-means)

– Dimension reduction (PCA)

– Density estimation

• Reinforcement learning

• Collaborative filtering

• Graphical models

• Probabilistic inference

• … … 25

Theme of This Course

How to solve optimization problems efficiently in the Big Data environment?

26

Structure of Optimization

• Linear vs. Nonlinear

• Deterministic vs. Stochastic

• Continuous vs. Combinatorial

• Smooth vs. Nonsmooth

• Convex vs. Nonconvex

• Low-dimensional vs. High-dimensional

• Static vs. Online

• Single vs. Sequential Decision Making

27

Easy or Hard?

What makes an optimization problem easy or hard?

Find minimum volume ellipsoid Find maximum volume ellipsoid

Polynomial solvable NP-hard

Example from L. Xiao, CS286 seminar 28

Easy or Hard?

What makes an optimization problem easy or hard?

Linear Optimization Polynomial Optimization

Polynomial solvable P ~ NP-hard

29

Complexity and Convexity

“ The great watershed in optimization isn’t between linearity and nonlinearity, but convexity and nonconvexity.”

— R. Rockafellar, SIAM Review 1993

Non-Convex Optimization Convex Optimization30

Types of Algorithms

• Polynomial-time algorithms (dates back to 1970s or so)

– E.g., ellipsoid method, interior point method (IPM)

• First-order algorithms (dates back to 1900s, resurrection since 1980)

– E.g., accelerated gradient descent method (AGD)

• Second-order algorithms – E.g., Newton method, L-BFGS

• Stochastic and randomized algorithms (dates back to 1950s, resurrection since 2004)

– E.g., stochastic approximation (SA)

31

Central Topics

32

Large-Scale

Optimization

Algorithms

Complexity

Applications

Courtesy Warning

• If you are looking for software/tools for big data analytics :– IE 529 Stats of Big Data & Clustering

– Check CS/CSE related data mining courses

• If you are looking for introduction-level optimization:– ECE 490 Introduction to Optimization

– IE 510 Advanced Nonlinear Programming

• If you are looking for combinatorial optimization:– IE 598 Advanced Integer Programming

• Otherwise, welcome to the class and see you next week!

33