cogs 118a: natural computation i angela yu ajyu@ucsd ta: he (crane) huang january 5, 2009

Cogs 118A: Natural Computation I

Angela Yu

[email protected]

TA: He (Crane) Huang

January 5, 2009

www.cogsci.ucsd.edu/~ajyu/Teaching/Cogs118A_wi10/cogs118A.html

Course Website

• Policy on academic integrity

• Scribe schedule

118A Curriculum OverviewRegression (3)

Graphical models (8)

Approximate inference & sampling (10, 11)

Applications to cognitive science

midterm

finalprojectdue

assignments

No final!

Sequential data (13)

Information theory, reinforcement learning

118B Curriculum (Tentative) Overview

Classification (4)

Neural networks (5)

Kernel methods & SVM (6, 7)

Final project

Mixture models & EM (9)

Continuous latent variables (12)

Machine Learning: 3 Types

Supervised learning

The agent also observes a sequence of labels or outputs y1, y2, …,

and the goal is to learn the mapping f: x y.

Unsupervised learning

The agent’s goal is to learn a statistical model for the inputs, p(x),to be used for prediction, compact representation, decisions, …

Reinforcement learning

The agent can perform actions a1, a2, a3, …, which affect the

state of the world, and receives rewards r1, r2, r3, … Its goal is

to learn a policy : {x1, x2, …, xt} at that maximizes <reward>.

Imagine an agent observes a sequence of inputs: x1, x2, x3, …

Supervised LearningClassification: the outputs y1, y2, … are discrete labels

Dog Cat

Regression: the outputs y1, y2, … are continuously valued

x

y

Applications: Cognitive ScienceObject Categorization Speech Recognition

WET VET

Face Recognition Motor Learning

• Noisy sensory inputs

• Incomplete information

• Excess inputs (irrelevant)

• Prior knowledge/bias

• Changing environment

• No (one) right answer, inductive inference (open-ended)

• Stochasticity (uncertainty), not deterministic

• …

Challenges

Face Recognition

An Example: Curve-Fitting

x

y

Polynomial Curve-Fitting

€

y(x;w) = w0 + w1x +w2x 2 +K + wn x n = w j xj

j= 0

n

∑

Linear model

Error function (root-mean-square)

€

E(w) = y(x j ,w) − y j( )2

j= 0

n

∑ N

Linear Fit?

x

y

Quadratic Fit?

x

y

5th-Order Polynomial Fit?

x

y

10th-Order Polynomial Fit?

x

y

And the Answer is…

x

y

Quadratic

Training Error vs. Test (Generalization) Error

x

y

y

x

1st-order

5th-order 10th-order

2nd-order

• Model complexity important

• Minimizing training error overtraining

• A better error metric: test (generalization) error

• But test error costs extra data (precious)

• Fix 1: regularization (3.1)

• Fix 2: Bayesian model comparison (3.3)

What Did We Learn?

Math Review

cogs 118a: natural computation i angela yu ajyu@ucsd ta: he (crane) huang january 5, 2009

Documents

xy5thorder polynomial

xy10thorder polynomial

squarelinear fit

xyquadratic fit

better error metric

sequence of inputs

outputs y1

information theory