session 26 ts, predictive analytics: moving out of · pdf filesession 26 ts, predictive...
Post on 19-Mar-2018
219 Views
Preview:
TRANSCRIPT
Session 26 TS, Predictive Analytics: Moving Out of Square One
Moderator:
Jean-Marc Fix, FSA, MAAA
Presenters: Jean-Marc Fix, FSA, MAAA
Jeffery Robert Huddleston, ASA, CERA, MAAA
Predictive Modeling:
Getting out of
Jean-Marc Fix, FSA, MAAA Jeff Huddleston, ASA, MAAA, CERA
Vice President, R&D Senior ConsultantOptimum Re Insurance Deloitte Consulting LLP
Life and Annuity SymposiumNashville, May 2016
On to Square 2 Today’s goal Things to have before you start Things to know before you start Starting The view from square 2
2
Today’s Goal You have heard a lot about predictive modeling Time to get your feet wet Moving to square 2
3
Things to Have R (free!) Basic understanding of key concepts Data A question (for this lesson we reverse the logical order!) PatienceWillingness to ask questions
4
What is R? For oldies: similarities to APL Don’t think of it as a programming language – to start Collection of functions extracted from useful packages Easy to dabble Lots of online resources (www.rseek.org, Coursera)
7
Useful Packages and Functions Get R Finding functions: word of the net R-seek Quick-R R-blogger Loading package that has the functions you wantinstall.package(packIreallywant)library(packIreallywant)
8
Data Wrangling Get data Basic cleaning in excel First row: headers Variable names Avoid blank spaces in names
First column: ID dplyr package (also by Wickham)
11
Clean Data Does data look as expected Remove quotation marks Consistent date format Clean trailing spaces Fill blank values or NA values with NA Save as CSV file Check CSV file in Notepad
12
Load Data Open R library(libraryname) #load libraries you will need
Set working directory where your working file is getwd(), setwd() Use readcsv or readcsv2 functions Can also read directly from Excel
13
Useful Basic R Commands c(v1,v2,v3) concatenate x:y seq(min, max, by=5) sequence ?fn() ??fn() x<-5
14
Useful Basic R Commands ls() lists object in workspace rm(object) remove object rm(list=ls()) empties workspace cbind(v1,v2,v3) concatenate vectors in column rbind(v1,v2,v3) concatenate vectors in row? rep(x, times)
15
Useful Basic R Commands unique() runif( num, min, max) rnorm( num, mean, sd) as.date(as.character(textdate, format) as.factor(data$var1) as.data.frame(matrix that looks like a data frame)
16
Basic Useful Packages See script Ask around See r-bloggers community For development look into RStudio
18
Explore Data class() names() head() tail() dataset$varname dataset[,varnumber] dataset[obsnumber,]
20
Basic Graphic Exploration Histogram and charts hist(), qplot(), boxplot(), ggplot() Correlations summarize(), aggregate(), cor(), pairs.panels()
24
Split Data in Two Train set Test set Size vary from 50% train/50% set to 80/20Want a decent size in test set Can be purely random, random by groups or by time
25
Easy splitting in R iris[iris$Species %in% c(“versicolor”,”virginica”)] iris[iris[,5]==“versicolor”] “!” is not and “==“ is equal Split Use dplyr library Look at sample_n() and sample_frac() functions in dplyr
26
Components of GLME(Y)=g-1(βX)
or g(E(Y))=βX
or g(Y)=βX + ε
Independent variables
Dependent variables
generated by an exponential
family distribution
Link function
32
Random error
Choose Model ParametersDistribution for lapses: commonly used Poisson distributionLink function: default link for Poisson is log
33
Setting up the Model glm(y~offset(log(Exposure)) +var1+var2, family=poisson(), data=yourdataframe) link function is implied as log when family is poisson()
35
Run First Iteration on Train Add a new variable Evaluate AIC - lower is better Is the complexity worth it? Repeat Variables can be interaction between variables or lagged or power of variables
37
top related