using r to win kaggle data mining competitions chris raimondi november 1, 2012

Download Using R to win Kaggle Data Mining Competitions Chris Raimondi November 1, 2012

If you can't read please download the document

Upload: dale-lang

Post on 23-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • Using R to win Kaggle Data Mining Competitions Chris Raimondi November 1, 2012
  • Slide 2
  • Overview of talk What I hope you get out of this talk Life before R Simple model example R programming language Background/Stats/Info How to get started Kaggle
  • Slide 3
  • Overview of talk Individual Kaggle competitions HIV Progression Chess Mapping Dark Matter Dunnhumbys Shoppers Challenge Online Product Sales
  • Slide 4
  • What I want you to leave with Belief that you dont need to be a statistician to use R - NOR do you need to fully understand Machine Learning in order to use it Motivation to use Kaggle competitions to learn R Knowledge on how to start
  • Slide 5
  • My life before R Lots of Excel Had tried programming in the past got frustrated Read NY Times article in January 2009 about R & Google Installed R, but gave up after a couple minutes Months later
  • Slide 6
  • My life before R Using Excel to run PageRank calculations that took hours and was very messy Was experimenting with Pajek a windows based Network/Link analysis program Was looking for a similar program that did PageRank calculations Revisited R as a possibility
  • Slide 7
  • My life before R Came across R Graph Gallery Saw this graph
  • Slide 8
  • Slide 9
  • Addicted to R in one line of code pairs(iris[1:4], main="Edgar Anderson's Iris Data", pch=21, bg=c("red", "green3", "blue")[unclass(iris$Species)]) pairs = function iris = dataframe
  • Slide 10
  • What do we want to do with R? Machine learning a.k.a. or more specifically Making models We want to TRAIN a set of data with KNOWN answers/outcomes In order to PREDICT the answer/outcome to similar data where the answer is not known
  • Slide 11
  • Slide 12
  • How to train a model R allows for the training of models using probably over 100 different machine learning methods To train a model you need to provide 1.Name of the function which machine learning method 2.Name of Dataset 3.What is your response variable and what features are you going to use
  • Slide 13
  • Example machine learning methods available in R BaggingPartial Least Squares Boosted TreesPrincipal Component Regression Elastic NetProjection Pursuit Regression Gaussian ProcessesQuadratic Discriminant Analysis Generalized additive modelRandom Forests Generalized linear modelRecursive Partitioning K Nearest NeighborRule-Based Models Linear RegressionSelf-Organizing Maps Nearest Shrunken CentroidsSparse Linear Discriminant Analysis Neural NetworksSupport Vector Machines
  • Slide 14
  • Code used to train decision tree library(party) irisct