feature engineering week 3 video 3. feature engineering
TRANSCRIPT
![Page 1: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/1.jpg)
Feature Engineering
Week 3 Video 3
![Page 2: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/2.jpg)
Feature Engineering
![Page 3: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/3.jpg)
Feature Engineering
Up until this point in the class, we’ve talked about building and validating prediction models
Models that infer a predicted variable from predictor variables
![Page 4: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/4.jpg)
Where the Predicted Variable Comes From
A couple lectures ago, we went into a little more detail about where the predicted variable can come from
![Page 5: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/5.jpg)
Where the Predictor Variables Come From
Where do the predictor variables come from?
Do they fall out of the sky?
Do they come from the Office for Predictor Variables in Washington, DC?
![Page 6: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/6.jpg)
Feature Engineering
The art of creating predictor variables
A major topic in its own right
At Teachers College, I teach a semester-long design studio in Feature Engineering
![Page 7: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/7.jpg)
Why a whole class?
Feature engineering is the least well-studied part of the process of developing prediction models But it’s arguably the most important part Your model will never be any good if your
features (predictors) aren’t very good
![Page 8: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/8.jpg)
Why a whole class?
It is an art, it is human-driven design It involves lore rather than well-known
and validated principles It is hard!
![Page 9: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/9.jpg)
The Big Idea
How can we take the voluminous, ill-formed, and yet under-specified data that we now have in education
And shape it into a reasonable set of variables
In an efficient, effective, and predictive way?
![Page 10: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/10.jpg)
A process in its own right
1. Brainstorming features 2. Deciding what features to create3. Creating the features4. Studying the impact of features on
model goodness5. Iterating on features if useful6. Go to 3 (or 1)
![Page 11: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/11.jpg)
Brainstorming Features
Can be more or less formal
![Page 12: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/12.jpg)
IDEO tips for Brainstorming
1. Defer judgment2. Encourage wild ideas3. Build on the ideas of others4. Stay focused on the topic5. One conversation at a time6. Be visual7. Go for quantity
http://www.openideo.com/fieldnotes/openideo-team-notes/seven-tips-on-better-brainstorming
![Page 13: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/13.jpg)
Building on the Ideas of Others Doesn’t just have to be people nearby
There’s a huge literature out there of features people have tried and what has worked, or failed to work, for a range of problems
Read papers from researchers working on similar problems, and see what you can use
![Page 14: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/14.jpg)
Brainstorming Features
On hard projects, my research group often meets as a team over pizza and beer to brainstorm
On easier projects, one person brainstorms solo And then often discusses their features with
another person, who offers further suggestions
![Page 15: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/15.jpg)
Deciding what features to create
There is never infinite time A trade-off between the effort to create a
feature and how likely it is to be useful “How likely it is to be useful” – the best you can do
is to Look at whether similar features have been useful for
similar problems Use your best intuition
Worth biasing in favor of features that are different than anything else you’ve tried before Explores a different part of the space
![Page 16: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/16.jpg)
Creating features
Excel – Really good for prototyping features
Google Refine/OpenRefine – Some alternate features that are nice
Distillation Code – The scalable solution… but harder to check yourself or explore
![Page 17: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/17.jpg)
Some useful tools in Excel
Pivot Tables – great for aggregating data, and getting the average, min, max, stdev
Vlookup – great for translating from aggregations (student-level data, for instance) back to action-level data
Example: How much faster is the current student action on the average student attempt on the current skill?
![Page 18: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/18.jpg)
![Page 19: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/19.jpg)
![Page 20: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/20.jpg)
![Page 21: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/21.jpg)
![Page 22: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/22.jpg)
![Page 23: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/23.jpg)
![Page 24: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/24.jpg)
![Page 25: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/25.jpg)
![Page 26: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/26.jpg)
![Page 27: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/27.jpg)
![Page 28: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/28.jpg)
![Page 29: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/29.jpg)
![Page 30: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/30.jpg)
![Page 31: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/31.jpg)
![Page 32: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/32.jpg)
Further resources
http://www.howtogeek.com/howto/13780/using-vlookup-in-excel/
http://www.excel-easy.com/data-analysis/pivot-tables.html
http://spreadsheets.about.com/od/datamanagementinexcel/ss/8912pivot_table.htm
![Page 33: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/33.jpg)
Other useful things you can do in Excel
Counts-so-far Counts-last-n-actions Differentiating first and subsequent
attempts Ratios between events of interest Cut-off based features
![Page 34: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/34.jpg)
GoogleRefine(now OpenRefine)
Functionality to make it easy to regroup and transform data Find similar names Connect names Bin numerical data Mathematical transforms showing resultant
graphs Text transforms and column creation
![Page 35: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/35.jpg)
GoogleRefine(now OpenRefine)
Functionality for finding anomalies/outliers
![Page 36: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/36.jpg)
GoogleRefine(now OpenRefine)
Functionality for automatically repeating the same process on a new data set
*Really* nice for cases where you complete a complex process and want to repeat it
![Page 37: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/37.jpg)
GoogleRefine(now OpenRefine)
Some videos you may want to watch later
http://www.youtube.com/watch?v=B70J_H_zAWM
http://www.youtube.com/watch?v=cO8NVCs_Ba0
http://www.youtube.com/watch?v=5tsyz3ibYzk
![Page 38: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/38.jpg)
Feature Iteration
Sometimes when a feature looks like it might be good
It’s worth iterating on that feature, trying close variants to see if they do better
![Page 39: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/39.jpg)
Example
You have a feature “slow actions after hints”(cf. Shih, Koedinger, & Scheines, 2008)
You define “slow action” as an action taking over 20 seconds
What if 30 seconds is a better cut-off?
![Page 40: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/40.jpg)
Ways to accomplish this…
By hand Programming (Java? Matlab?) Excel Equation Solver
![Page 41: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/41.jpg)
Excel Equation Solver Tutorials http://office.microsoft.com/en-us/excel-
help/define-and-solve-a-problem-by-using-solver-HP010072691.aspx
http://www.youtube.com/watch?v=K4QkLA3sT1o
One tip: multistart option avoids local minima (that can sometimes block the solver from even getting started)
![Page 42: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/42.jpg)
A few thoughts
![Page 43: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/43.jpg)
Does feature engineering over-fit? It can Which is why it’s useful to remember The true test of a model is whether it
works on entirely unseen data
If you iterate a lot and use cross-validated goodness
Then the true test of your model will be either a held-out data set or newly-collected data later on
![Page 44: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/44.jpg)
Feature Engineering
Your features come from somewhere
You can take a standard set of variables or pre-existing variables No question it’s faster
But thinking about your variables is likely to lead to better models
![Page 45: Feature Engineering Week 3 Video 3. Feature Engineering](https://reader035.vdocuments.us/reader035/viewer/2022062221/56649e025503460f94aeda4f/html5/thumbnails/45.jpg)
Next Lecture
Automated feature generation and selection