modeling rich structured data via kernel distribution...

40
Introduction Machine Learning CS 7641,CSE/ISYE 6740, Fall 2015 Le Song

Upload: others

Post on 23-Apr-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Introduction

Machine Learning CS 7641,CSE/ISYE 6740, Fall 2015

Le Song

Page 2: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

What is machine learning (ML)

Study of algorithms that improve their performance at some task with experience

2

Page 3: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Common to industrial scale problems

3

13 million wikipedia pages

800 million users

6 billion photos

24 hours video uploaded per minutes

> 1 trillion webpages

340 million tweets per day

Data from 2014

Page 4: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Increasingly relevant to science problems

4

Genome

Cell Protein

Brain Galaxy

Weather

Self-driving car Robot

Finance

Sustainability

Design

Music

Page 5: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Syllabus

Cover a number of most commonly used machine learning algorithms in sufficient amount of details on their mechanisms.

Organization

Unsupervised learning (data exploration)

Learning without labels or without optimizing for predictive task

Supervised learning (predictive models)

Learning with labels, focusing on predictive performance

Complex models (dealing with nonlinearity, combine models etc)

Nonlinearity, complex dependency, real world applications

Basics and Breadth (guest lecture and seminars)

5

Page 6: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Syllabus: unsupervised learning

Learning without labels or without optimizing for predictive task

Clustering vectorial data

Kmeans

Hierarchical clustering

Clustering networks

Spectral algorithm

Dimensionality reduction,

Principal component analysis

Dimensionality for manifold

Locally linear embedding

Density estimation

Feature selection

Novelty/abnormality detection

6

Page 7: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Organizing Images

7

Image Databases

What are the desired outcomes? What are the inputs (data)? What are the learning paradigms?

Page 8: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Find community in social networks

8

What are the desired outcomes? What are the inputs (data)? What are the learning paradigms?

Page 9: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Visualize Image Relations

9

Each image has thousands or millions of pixels.

What are the desired outcomes? What are the inputs (data)? What are the learning paradigms?

Page 10: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Shape of data

10

What are the desired outcomes? What are the inputs (data)? What are the learning paradigms?

Page 11: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Feature selection

11

What are the desired outcomes? What are the inputs (data)? What are the learning paradigms?

Page 12: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Novelty/abnormality detection

12

What are the desired outcomes? What are the inputs (data)? What are the learning paradigms?

Find abnormal object

Page 13: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Syllabus: supervised learning

Learning with labels, focusing on predictive performance

Classifications

Nearest neighbor classifier

Naïve Bayes classifier

Logistic regression

Support vector machine

Combined classifiers

Boosting

Regressions

Ridge regression

Cross-validation

13

Page 14: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Image classification

14

What are the desired outcomes? What are the inputs (data)? What are the learning paradigms?

Page 15: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Face Detection

15

What are the desired outcomes? What are the inputs (data)? What are the learning paradigms?

Page 16: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Weather Prediction

16

Pre

dict

Predict

Numeric values: 40 F Wind: NE at 14 km/h Humidity: 83%

What are the desired outcomes? What are the inputs (data)? What are the learning paradigms?

Page 17: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Understanding brain activity

17

What are the desired outcomes? What are the inputs (data)? What are the learning paradigms?

Page 18: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Nonlinear classifier

18

Nonlinear Decision Boundaries

Linear SVM Decision Boundaries

What are the desired outcomes? What are the inputs (data)? What are the learning paradigms?

Kernel methods Neural Networks

Page 19: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Syllabus: advanced topics, complex models

Nonlinearity, complex dependency, real world applications

Kernel methods

Hidden Markov models

Graphical models

Topic modeling

Social network analysis

Collaborative filtering

19

Page 20: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Handwritten digit recognition/text annotation

20

Inter-character dependency

Inter-word dependency

Aoccdrnig to a sudty at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a ttoal mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.

What are the desired outcomes? What are the inputs (data)? What are the learning paradigms?

Page 21: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Speech recognition

21

“Machine Learning is the preferred method for speech recognition …”

Audio signals

Text

Models Hidden Markov Models

Page 22: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

tgtatcccagcatattttacgtaaaaacaaaacggtaatgcgaacataacttatttattggggcccggaccgcaaaccggccaaacgcgtttgcacccataaaaacataagggcaacaaaaaaattgttaagctgttgtttatttttgcaatcgaaaatc

gaaagcgctcaaatagctgcgatcactcgggagcagggtaaagtcgcctcgaaacaggaagctgaagcatcttctataaatacactcaaagcgatcattccgaggcgagtctggttagaaatttacatggactgcaaaaaggtatagccccacaaactgc

gtctccacatcgctgcgtttcggcagctaattgccttttagaaattattttcccatttcgagaaactcgtgtgggatgccggatgcggctttcaatcacttctggcccgggatcggattgggtcacattgtctgcgggctctattgtctcgatccgcgga

tttaaggcgcagttcgcgtgcttagcggtcagaaaggcagagattcggttcggattgatgcgctggcagcagggcacaaagatctaatgactggcaaatcgctacaaataaattaaagtccggcggctaattaatgagcggactgaagccactttggcgg

agttcattaaccaaaaaacagcagataaacaaaaacggcaaagaaaattgccacagagttgtcacgctttgttgcacaaacatttgtgcagaaaagtgaaaagcttttagccattattaagtttttcctcagctcgctggcagcacttgcgaatgtaaat

tgaaactgatgttcctcataaatgaaaattaatgtttgctctacgctccaccgaactcgcttgtttgggggattggctggctaatcgcggctagatcccaggcggtataaccttttcgcttcatcagttgtgaaaccagatggctggtgttttggcaacg

acagccagcggactcccctcgaacgctctcgaaatcaagtggctttccagccggcccgctgggccgctcgcccactggaccggtattcccaggccaggccacactgtaccgcaccgcataatcctcgccagactcggcgctgataaggcccaatgtcaag

accgcactccgcaggcgtctatttatgccaaggaccgttcttcttcagctttcggctcgagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccgctcactgacaatgaacggcgtcttgtcaacc

attcatattcatgctgacattcatattcattcctttggttttttgtcttcgacggactgaaaagtgcggagagaaacccaaaaacagaagcgcgcaaagcgccgttaatatgcgaactcagcgaactcattgaagttatcacaacaccatatccataacc

atacccatatccatatcaatatcaatatcgctattattaacgatcatgctctgctgatcaagtattcagcgctgcgctagattcgacagattgaatcgagctcaatagactcaacagactccactcgacagatgcgcaatgccaaggacaattgccgtgt

gtaagtggagtaaacgaggcgtatgcgcaacctgcacctggcggacgcggcgtatgcgcaatgtgcaattcgcttaccttctcgttgcgggtcaggaactcccagatgggaatggccgatgacgagctgatctgaatgtggaaggcgcccagcaggcagc

aggccaagattactttcgccgcagtcgtcatggtgtcgttgctgcttttatgttgcgtactccgcactacacggagagttcaggggattcgtgctccgtgatctgtgatccgtgttccgtgggtcaattgcacggttcggttgtgtaaccttcgtgtggc

gactttctttttttttagggcccaataaaagcgcttttgtggcggcttgatagattatcacttggtttcggtggctagccaagtggctttcttctgtccgacgcacttaattgaattaaccaaacaacgagcgtggccaattcgtattatcgctgttttc

gggtgtacgtgtgtctcagcttgaaacgcaaaagcttgtttcacacatcggtttctcggcaagatgggggagtcagtcggtctagggagaggggcgcccaccagtcgatcacgaaaacggcgaattccaagcgaaacggaaacggagcgagcactataat

cttgcagtactatgtcgaacaaccgatcgcggcgatgtcagtgagtcgtcttcggacagcgctggcgctccacacgtatttaagctctgagatcggctttgggagagcgcagagagcgccatcgcacggcagagcgaaagcggcagtgagcgaaagcgag

agggagagcggcagcgggtgggggatcgggagccccccgaaaaaaacagaggcgcacgtcgatgccatcggggaattggaacctcaatgtgtgggaatgtttaaatattctgtgttaggtagtgtagtttcatagactatagattctcatacagattgta

tctgagagtccttcgagccgattatacacgacagcaaaatatttcagtcgcgcttgggcaaaaggcttaagcacgactcccagtccccccttacatttgtcttcctaagcccctggagccactatcaaacttgttctacgcttgcactgaaaatagaaat

taaaaaccaaagtaaacaatcaaaaagaccaaaaacaataacaaccagcaccgagtcgaacatcagtgaggcattgcaaaaatttcaaagtcaagtttgcgtcgtcatcgcgtctgagtccgatcaagccgggcttgtaattgaagttgttgatgagccg

cgggattactggattgtggcgaattctggtcagcatacttaacagcagcccgctaattaagcaaaataaacatatcaaattccagaatgcgacggcgccatcatcctgtttgggaattcaattcgcgggcagatcgtttaattcaattaaaaggtagaag

tcggtaaaagggagcagaagaatgcgatcgctggaatttcctaacatcacggaccccataaatttgataagcccgagctcgctgcgttgagtcagccaccccacatccccaaatccccgccaaaagaagacagctgggttgttgactcgccagattgagg

tgcggattgcagtggagtggacctggtcaaagaagcaccgttaatgtgctgattccattcgattccatccgggaatgcgataaagaaaggctctgatccaagcaactgcaatccggatttcgattttctctttccatttggttttgtatttacgtactta

tgggaaagcattctaatgaagacttggagaagacttacgttatattcagaccatcgtgcgatagaggatgagtcatttccatatggccgaaatttattatgtttactatcgtttttagaggtgttttttggacttaccaaaagaggcatttgttttctgg

gattattcaactgaaaagatatttaaattttttcttggaccattttcaaggttccggatatatttgaaacacactagctagcagtgttggtaagttacatgtatttctataatgtcatattcctttgtccgtattcaaatcgaatactccacatctcaca

ggatgttgtacttgaggaattggcgatcgtagcgatttcccccgccgtaaagttcctgatcctcgttgtttttgtacatcataaagtccggattctgctcgtcgccgaagatgggaacgaagctgccaaagctgagagtctgcttgaggtgctggtccgg

acggcgtcccagctggataaccttgctgtacagatcggcatctgcctggagggcacgatcgaaatccttccagtggacgaacttcacctgctcgctgggaatagcgttgttgtcaagcagctcaaggagcgtattcgagttgacgggctgcaccacgtta

agttcctgctccttcgctggggattcccctgcgggtaagcgccgcttgcttggactcgtttccaaatcccatagccacgccagcagaggagtaacagagctcwhereisthegenetgattaaaaatatcctttaagaaagcccatgggtataacttacg

tgctcactgcgtcctatgcgaggaatggtctttaggttctttatggcaaagttctcgcctcgcttgcccagccgcggtacgttcttggtgatctttaggaagaatcctggactactgtcgtctgcctggcttatggccacaagacccaccaagagcgaaa

cagacaggactgttatgattctcatgctgatgcgactgaagcttcacctgactcctgctccacaattggtggcctttatatagcgagatccacccgcatcttgcgtggaatagaaatgcgggtgactccaggaattagcattatcgatcggaaagtgcaa

agcagataaaactgaactaacctgacctaaatgcctggccataattaagtgcatacatacacattacattacttacatttgtataagaactaaattttatagtacataccacttgcgtatgtaaatgcttgtcttttctcttatatacgttttataaatt

tgtatcccagcatattttacgtaaaaacaaaacggtaatgcgaacataacttatttattggggcccggaccgcaaaccggccaaacgcgtttgcacccataaaaacataagggcaacaaaaaaattgttaagctgttgtttatttttgcaatcgaaaatc

gaaagcgctcaaatagctgcgatcactcgggagcagggtaaagtcgcctcgaaacaggaagctgaagcatcttctataaatacactcaaagcgatcattccgaggcgagtctggttagaaatttacatggactgcaaaaaggtatagccccacaaactgc

gtctccacatcgctgcgtttcggcagctaattgccttttagaaattattttcccatttcgagaaactcgtgtgggatgccggatgcggctttcaatcacttctggcccgggatcggattgggtcacattgtctgcgggctctattgtctcgatccgcgga

tttaaggcgcagttcgcgtgcttagcggtcagaaaggcagagattcggttcggattgatgcgctggcagcagggcacaaagatctaatgactggcaaatcgctacaaataaattaaagtccggcggctaattaatgagcggactgaagccactttggcgg

agttcattaaccaaaaaacagcagataaacaaaaacggcaaagaaaattgccacagagttgtcacgctttgttgcacaaacatttgtgcagaaaagtgaaaagcttttagccattattaagtttttcctcagctcgctggcagcacttgcgaatgtaaat

tgaaactgatgttcctcataaatgaaaattaatgtttgctctacgctccaccgaactcgcttgtttgggggattggctggctaatcgcggctagatcccaggcggtataaccttttcgcttcatcagttgtgaaaccagatggctggtgttttggcaacg

acagccagcggactcccctcgaacgctctcgaaatcaagtggctttccagccggcccgctgggccgctcgcccactggaccggtattcccaggccaggccacactgtaccgcaccgcataatcctcgccagactcggcgctgataaggcccaatgtcaag

accgcactccgcaggcgtctatttatgccaaggaccgttcttcttcagctttcggctcgagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccgctcactgacaatgaacggcgtcttgtcaacc

attcatattcatgctgacattcatattcattcctttggttttttgtcttcgacggactgaaaagtgcggagagaaacccaaaaacagaagcgcgcaaagcgccgttaatatgcgaactcagcgaactcattgaagttatcacaacaccatatccataacc

atacccatatccatatcaatatcaatatcgctattattaacgatcatgctctgctgatcaagtattcagcgctgcgctagattcgacagattgaatcgagctcaatagactcaacagactccactcgacagatgcgcaatgccaaggacaattgccgtgt

gtaagtggagtaaacgaggcgtatgcgcaacctgcacctggcggacgcggcgtatgcgcaatgtgcaattcgcttaccttctcgttgcgggtcaggaactcccagatgggaatggccgatgacgagctgatctgaatgtggaaggcgcccagcaggcagc

aggccaagattactttcgccgcagtcgtcatggtgtcgttgctgcttttatgttgcgtactccgcactacacggagagttcaggggattcgtgctccgtgatctgtgatccgtgttccgtgggtcaattgcacggttcggttgtgtaaccttcgtgtggc

gactttctttttttttagggcccaataaaagcgcttttgtggcggcttgatagattatcacttggtttcggtggctagccaagtggctttcttctgtccgacgcacttaattgaattaaccaaacaacgagcgtggccaattcgtattatcgctgttttc

gggtgtacgtgtgtctcagcttgaaacgcaaaagcttgtttcacacatcggtttctcggcaagatgggggagtcagtcggtctagggagaggggcgcccaccagtcgatcacgaaaacggcgaattccaagcgaaacggaaacggagcgagcactataat

cttgcagtactatgtcgaacaaccgatcgcggcgatgtcagtgagtcgtcttcggacagcgctggcgctccacacgtatttaagctctgagatcggctttgggagagcgcagagagcgccatcgcacggcagagcgaaagcggcagtgagcgaaagcgag

agggagagcggcagcgggtgggggatcgggagccccccgaaaaaaacagaggcgcacgtcgatgccatcggggaattggaacctcaatgtgtgggaatgtttaaatattctgtgttaggtagtgtagtttcatagactatagattctcatacagattgta

tctgagagtccttcgagccgattatacacgacagcaaaatatttcagtcgcgcttgggcaaaaggcttaagcacgactcccagtccccccttacatttgtcttcctaagcccctggagccactatcaaacttgttctacgcttgcactgaaaatagaaat

taaaaaccaaagtaaacaatcaaaaagaccaaaaacaataacaaccagcaccgagtcgaacatcagtgaggcattgcaaaaatttcaaagtcaagtttgcgtcgtcatcgcgtctgagtccgatcaagccgggcttgtaattgaagttgttgatgagccg

cgggattactggattgtggcgaattctggtcagcatacttaacagcagcccgctaattaagcaaaataaacatatcaaattccagaatgcgacggcgccatcatcctgtttgggaattcaattcgcgggcagatcgtttaattcaattaaaaggtagaag

tcggtaaaagggagcagaagaatgcgatcgctggaatttcctaacatcacggaccccataaatttgataagcccgagctcgctgcgttgagtcagccaccccacatccccaaatccccgccaaaagaagacagctgggttgttgactcgccagattgagg

tgcggattgcagtggagtggacctggtcaaagaagcaccgttaatgtgctgattccattcgattccatccgggaatgcgataaagaaaggctctgatccaagcaactgcaatccggatttcgattttctctttccatttggttttgtatttacgtactta

tgggaaagcattctaatgaagacttggagaagacttacgttatattcagaccatcgtgcgatagaggatgagtcatttccatatggccgaaatttattatgtttactatcgtttttagaggtgttttttggacttaccaaaagaggcatttgttttctgg

gattattcaactgaaaagatatttaaattttttcttggaccattttcaaggttccggatatatttgaaacacactagctagcagtgttggtaagttacatgtatttctataatgtcatattcctttgtccgtattcaaatcgaatactccacatctcaca

ggatgttgtacttgaggaattggcgatcgtagcgatttcccccgccgtaaagttcctgatcctcgttgtttttgtacatcataaagtccggattctgctcgtcgccgaagatgggaacgaagctgccaaagctgagagtctgcttgaggtgctggtccgg

acggcgtcccagctggataaccttgctgtacagatcggcatctgcctggagggcacgatcgaaatccttccagtggacgaacttcacctgctcgctgggaatagcgttgttgtcaagcagctcaaggagcgtattcgagttgacgggctgcaccacgtta

agttcctgctccttcgctggggattcccctgcgggtaagcgccgcttgcttggactcgtttccaaatcccatagccacgccagcagaggagtaacagagctctgaaaacagttcatggtttaaaaatatcctttaagaaagcccatgggtataacttacg

tgctcactgcgtcctatgcgaggaatggtctttaggttctttatggcaaagttctcgcctcgcttgcccagccgcggtacgttcttggtgatctttaggaagaatcctggactactgtcgtctgcctggcttatggccacaagacccaccaagagcgaaa

cagacaggactgttatgattctcatgctgatgcgactgaagcttcacctgactcctgctccacaattggtggcctttatatagcgagatccacccgcatcttgcgtggaatagaaatgcgggtgactccaggaattagcattatcgatcggaaagtgcaa

agcagataaaactgaactaacctgacctaaatgcctggccataattaagtgcatacatacacattacattacttacatttgtataagaactaaattttatagtacataccacttgcgtatgtaaatgcttgtcttttctcttatatacgttttataaatt

Bioinformatics

22

Where is the gene?

Page 23: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Spam Filtering

23

What are the desired outcomes? What are the inputs (data)? What are the learning paradigms?

Page 24: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Webpage classification

24

Company homepage vs. University homepage

Page 25: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Organizing documents

25

Reading, digesting, and categorizing a vast text database is too much for human!

We want:

What are the desired outcomes? What are the inputs (data)? What are the learning paradigms?

Page 26: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Quantifying social influence

26

What are the desired outcomes? What are the inputs (data)? What are the learning paradigms?

Page 27: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Product Recommendation

27

What are the desired outcomes? What are the inputs (data)? What are the learning paradigms?

Page 28: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Robot Control

28

Now cars can find their own ways!

What are the desired outcomes? What are the inputs (data)? What are the learning paradigms?

Page 29: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Basics/Prerequisites

Probabilities

Distributions, densities, marginalization, conditioning

Statistics

Mean, variance, maximum likelihood estimation

Linear algebra

Vector, matrix, multiplication, inversion, eigen-decomposition

Algorithms and Programming

Matlab, Basic data structures, computational complexity

Convex optimization

Basics will be covered during lecture

29

Page 30: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Machine learning for apartment hunting

Suppose you are to move to Atlanta

And you want to find the most reasonably priced apartment satisfying your needs:

square-ft., # of bedroom, distance to campus …

Living area (ft2) # bedroom Rent ($)

230 1 600

506 2 1000

433 2 1100

109 1 500

150 1 ?

270 1.5 ?

Page 31: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Linear Regression Model

Assume 𝑦 is a linear function of 𝑥 (features) plus noise 𝜖

𝑦 = 𝜃0 + 𝜃1𝑥1 + ⋯ + 𝜃𝑛𝑥𝑛 + 𝜖

where 𝜖 is an error model as Gaussian 𝑁 0, 𝜎2

Let 𝜃 = 𝜃0, 𝜃1, … , 𝜃𝑛⊤, and augment data by one dimension

𝑥 ← 1, 𝑥 ⊤

Then 𝑦 = 𝜃⊤𝑥 + 𝜖

31

Probability

Linear algebra

Linear algebra

Page 32: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Least mean square method

Given m data points, find 𝜃 that minimizes the mean square error

𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛𝜃 𝐿 𝜃 =1

𝑚 𝑦𝑖 − 𝜃⊤𝑥𝑖 2𝑚

𝑖

Set gradient to 0 and find parameter

𝜕𝐿 𝜃

𝜕𝜃= −

2

𝑚 𝑦𝑖 − 𝜃⊤𝑥𝑖 𝑥𝑖 = 0

𝑚

𝑖

⇔ −2

𝑚 𝑦𝑖𝑥𝑖 +

2

𝑚 𝑥𝑖𝑥𝑖⊤

𝜃

𝑚

𝑖

= 0

𝑚

𝑖

32

Statistics

Optimization

Optimization

Linear algebra

Statistics Statistics

Page 33: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Matrix version of the gradient

Define 𝑋 = 𝑥1, 𝑥2, … 𝑥𝑚 , 𝑦 = 𝑦1, 𝑦2, … , 𝑦𝑚 ⊤, gradient becomes

𝜕𝐿 𝜃

𝜕𝜃= −

2

𝑚 𝑋𝑦 +

2

𝑚𝑋𝑋⊤𝜃

⇒ 𝜃 = 𝑋𝑋⊤ −1𝑋𝑦

33

Linear algebra

Linear algebra

Matrix inversion in 𝜃 = 𝑋𝑋⊤ −1𝑋𝑦 expensive to compute

Gradient descent

𝜃 𝑡+1 ← 𝜃 𝑡 +𝛼

𝑚 𝑦𝑖 − 𝜃 𝑡⊤

𝑥𝑖 𝑥𝑖

𝑚

𝑖

Algorithms Programming

Optimization

Page 34: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Probabilistic Interpretation of LMS

Assume 𝑦 is a linear in 𝑥 plus noise 𝜖 𝑦 = 𝜃⊤𝑥 + 𝜖

Assume 𝜖 follows a Gaussian N(0,σ)

𝑝 𝑦𝑖 𝑥𝑖; 𝜃 =1

2𝜋𝜎exp −

𝑦𝑖 − 𝜃⊤𝑥𝑖 2

2𝜎2

By independence assumption, likelihood is 𝐿 𝜃

= 𝑝 𝑦𝑖 𝑥𝑖; 𝜃 =

𝑚

𝑖

1

2𝜋𝜎

𝑚

exp − 𝑦𝑖 − 𝜃⊤𝑥𝑖𝑚

𝑖

2

2𝜎2

Probability

Page 35: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Probabilistic Interpretation of LMS, cont.

Hence the log-likelihood is:

log 𝐿 𝜃 = 𝑚 log1

2𝜋𝜎 −

1

2𝜎2 𝑦𝑖 − 𝜃⊤𝑥𝑖 2𝑚

𝑖

LMS is equivalent to MLE of 𝜃 !

𝐿𝑀𝑆: 1

𝑚 𝑦𝑖 − 𝜃⊤𝑥𝑖 2𝑚

𝑖

How to make it work in real data?

Statistics

Algorithms Programming

Page 36: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Textbooks

Textbooks:

No a single book which covers everything machine learning

Pattern Recognition and Machine Learning, Chris Bishop

Written by leading industrial researcher

Presented from probabilistic and graphical model view

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Trevor Hastie, Robert Tibshirani, Jerome Friedman

Written by leading statisticians

Presented from statistical view

Machine Learning: a Probabilistic Perspective, Kevin Murphy

Written by previous university professor and now googler

More details on derivation and good for self studying 36

Page 37: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Grading

4 assignments (50%)

Approximately 1 assignment every 5 lectures

More next slide

Midterm exam (20%)

Final exam (20%)

Background test (5%)

More in 2 slides

Participation (5%)

guest lectures, seminar attendance, and class feedback.

37

Page 38: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Assignments

Homework should be submitted before the deadline set in T-Square.

No late submission will be accepted through email.

We strongly encourage to use LaTeX for your submission.

Any kind of academic misconduct is subject to F grade as well as reporting to the Dean of students.

All answers and codes should be prepared by yourself.

If you refer to any material, it should be properly cited.

Write on your homework anyone with whom you collaborate

38

Page 39: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

Background test

Background test in the second lecture

check your basic knowledge of probability and statistics, linear algebra, and matlab.

Aim to check your readiness for the class

< 40%, the class may be too difficult, consider taking it next time

40-79%, brush up your basic knowledge

In both cases, need to attend a mandatory review section.

39

Page 40: Modeling Rich Structured Data via Kernel Distribution Embeddingslsong/teaching/CSE6740fall15/lecture0.pdf · 2015-08-18 · 13 million wikipedia pages 800 million users 6 billion

The team

Instructor: Le Song, Klaus 1340

TA: Bo Xie, Hanjun Dai, Ashwin Shenoi, Rashit Tevali

Guest Lecturer: TBD

Links:

TA contacts: [email protected]

Discussion group:

http://piazza.com/gatech/fall2015/cse6740cs7641isye6740/home

More information: http://www.cc.gatech.edu/~lsong/teaching/CSE6740fall15.html

40