big data competition: maximizing your potential exampled with the 2014 higgs boson machine learning...
DESCRIPTION
The Higgs Boson Machine Learning Challenge is, by far, one of the biggest big data competitions focusing on data analysis in the world. To be successful in such a competition, Cheng applied his knowledge in Computer Science, Mathematics, Statistics, and Physics, while his problem solving habit is developed during his training in Civil Engineering. In this presentation, Cheng will use his experience in this competition to illustrate some important elements in big data analytics and why they are important. The content of the presentation covers different disciplines such as physics, statistics, and mathematics. But no background knowledge of these areas are required to understand the essence of the presentation. In brief, the presentation covers the following content: An effective framework for general data mining projects, Introduction of the competition and its related physics background, Various techniques in data exploring and some traps to avoid, Various ways of feature enhancement, Model building and selection, and Optimization of model performanceTRANSCRIPT
BIG DATA COMPETITION: MAXIMIZING YOUR POTENTIAL
EXAMPLED WITH THE 2014 HIGGS BOSON MACHINE LEARNING CHALLENGE
Dr. Cheng CHEN email: [email protected]
twitter: @cheng_chen_us
Development Consulting International LLC
goDCI.com
1this presentation is copyright protected ©
Ohio State University, Tongji University
Ph.D. Civil Engineering
M.S. Applied Statistics
Minor Computer Science
Advanced trainings:
City and Regional Planning
Industrial and Systems Engineering
Mathematics
Passion: (this) machine learning
PRESENTER
2
• Goal: improve the procedure that produces the selection region of Higgs Boson
• 4 Month Duration
• 1,785 teams
• Many machine learning experts, statisticians, and physicist
• Top 5 are from 5 different countries
HIGGS BOSON MACHINE LEARNING CHALLENGE
3
Netherlands
Hungary
France
Russia
U.S.A/Chinahttp://www.kaggle.com/c/higgs-‐boson/leaderboard
Background
4
Data
Model
Understand
Explore Enhance
Train Select Optimize
read
visualize
reduce
generate
cross validate innovate
read
discuss
Validate
apply
fine-‐tune
find
©
Background
5
Data
Model
Understand
Explore Enhance
Train Select Optimize
read
visualize
reduce
generate
innovate
read
discuss
Validate
apply
fine-‐tune
find
cross validate
©
READ AND DISCUSS
6
• a.k.a the God Particle (explains some mass)
• A fundamental particle theorized in 1964 in the Standard Model of Particle Physics
• “Considered” discovered in 2011 – 2013 in LHC by CERN
• A number of prestigious awards in 2013, including a Nobel prize
HIGGS BOSON
7http://upload.wikimedia.org/wikipedia/commons/0/00/Standard_Model_of_Elementary_Particles.svg
A "definitive" answer might require "another few years" after the collider's 2015 restart.deputy chair of physics at Brookhaven National Laboratory
http://en.wikipedia.org/wiki/Higgs_boson
• Established in 1954
• Birth of World Wide Web (1989)
CERN: THE EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH
8
maps.google.com
• 27 km (17 mi) in circumference
• 175 meters (574 ft) beneath ground
• Built from 1998 to 2008
• Over 10,000 scientists and engineers
• Over 100 counties
• Seven particle detectors
LARGE HADRON COLLIDER (LHC)
9https://www.llnl.gov/news/llnl-‐set-‐host-‐international-‐lattice-‐physics-‐conference
http://en.wikipedia.org/wiki/Large_Hadron_Collider
http://en.wikipedia.org/wiki/Large_Hadron_Collider
• 46 meters long
• 25 meters in diameter
• Weighs about 7,000 tonnes
• Contains some 3000 km of cable
• Involves roughly 3,000 physicists from over 175 institutions in 38 countries.
ATLAS
10
http://en.wikipedia.org/wiki/Large_Hadron_Collider
http://higgsml.lal.in2p3.fr/documentation/
• 46 meters long
• 25 meters in diameter
• Weighs about 7,000 tonnes
• Contains some 3000 km of cable
• Involves roughly 3,000 physicists from over 175 institutions in 38 countries.
ATLAS
11
http://en.wikipedia.org/wiki/Large_Hadron_Collider
http://higgsml.lal.in2p3.fr/documentation/
• 46 meters long
• 25 meters in diameter
• Weighs about 7,000 tonnes
• Contains some 3000 km of cable
• Involves roughly 3,000 physicists from over 175 institutions in 38 countries.
ATLAS
12
http://en.wikipedia.org/wiki/Large_Hadron_Collider
http://higgsml.lal.in2p3.fr/documentation/
• Higgs Boson can not be measured directly (decays immediately into lighter particles)
• Other particles can decay into the same set of lighter particles
• PRODUCTION and DECAY of Higgs Boson depends on the mass, while mass was not predicted by theory (now we know it is close to 125 Gev)
CHALLENGES IN DETECTION OF HIGGS BOSON
13https://www2.physics.ox.ac.uk/sites/default/files/2012-‐03-‐27/sinead_farrington_pdf_17376.pdf
Seeing a circular shaped shadow does not mean the real object is a sphere ball
• Raw data collected from LHC
• Hundreds of millions of proton-‐proton collisions (event) per second
• 400 events of interest are selected per second
– Signal event (i.e. Higgs Boson)
–Background event (i.e. other particles)
• Events in Ad Hoc selection region (in certain channels) exceeding background noise
CURRENT DETECTION MECHANISM
14
Needs improvement in significance and robustness in selection criteria
• Simulated Data
• Fixed mass (125 GeV)
• Simplified decay channel
–Next Slide
• Simplified background events (three representative types only)
–Decay of the Z boson (91.2 GeV) into Tau-‐Tau –Decay of a pair of top quarks into lepton and hadronic tau –“Decay” of the W boson into lepton and hadronic tau due to imperfections in the particle identification procedure
• Simplified objective function (significance score)
SIMPLIFICATIONS FOR COMPETITION
15
• Decay of Tau-‐Tau Channel only
• One tau decays into lepton and two neutrino
• The other tau decays into hadronic tau and a neutrino
• (Note: Neutrinos can not be detected)
SIMPLIFIED DECAY CHANNEL
16
hadronic tau:a bunch of hadrons
• Decay of Tau-‐Tau Channel only
• One tau decays into lepton and two neutrino
• The other tau decays into hadronic tau and a neutrino
• (Note: Neutrinos can not be detected)
SIMPLIFIED DECAY CHANNEL
17
hadronic tau:a bunch of hadrons
• Decay of Tau-‐Tau Channel only
• One tau decays into lepton and two neutrino
• The other tau decays into hadronic tau and a neutrino
• (Note: Neutrinos can not be detected)
SIMPLIFIED DECAY CHANNEL
18
Jets MET
vectorized momenta are givenhadronic tau:a bunch of hadrons
Background
19
Data
Model
Understand
Explore Enhance
Train Select Optimize
read
visualize
reduce
generate
innovate
read
discuss
Validate
apply
fine-‐tune
find
cross validate
©
• 250,000 training
• 550,000 testing
• 30 variables
– 17 Primitive • Momenta • Direction
– 13 Derived
DATA DIMENSION
20
4 rows in training data
EventId DER_mass_MMC
DER_mass_transverse_met_lep
DER_mass_vis
DER_pt_h
DER_deltaeta_jet_jet
DER_mass_jet_jet
DER_prodeta_jet_jet
DER_deltar_tau_lep
DER_pt_tot
DER_sum_pt
100000 138.47 51.655 97.827 27.98 0.91 124.711 2.666 3.064 41.928 197.76100001 160.937 68.768 103.235 48.146 NA NA NA 3.473 2.078 125.157100002 NA 162.172 125.953 35.635 NA NA NA 3.148 9.336 197.814100003 143.905 81.417 80.943 0.414 NA NA NA 3.31 0.414 75.968
EventIdDER_pt_ratio_lep_tau
DER_met_phi_centrality
DER_lep_eta_centrality
PRI_tau_pt
PRI_tau_eta
PRI_tau_phi
PRI_lep_pt
PRI_lep_eta
PRI_lep_phi PRI_met
100000 1.582 1.396 0.2 32.638 1.017 0.381 51.626 2.273 -2.414 16.824100001 0.879 1.414 NA 42.014 2.039 -3.011 36.918 0.501 0.103 44.704100002 3.776 1.414 NA 32.154 -0.705 -2.093 121.409 -0.953 1.052 54.283100003 2.354 -1.285 NA 22.647 -1.655 0.01 53.321 -0.522 -3.1 31.082
EventId PRI_met_phi
PRI_met_sumet
PRI_jet_num
PRI_jet_leading_pt
PRI_jet_leading_eta
PRI_jet_leading_phi
PRI_jet_subleading_pt
PRI_jet_subleading_eta
PRI_jet_subleading_phi
PRI_jet_all_pt
100000 -0.277 258.733 2 67.435 2.15 0.444 46.062 1.24 -2.475 113.497100001 -1.916 164.546 1 46.226 0.725 1.158 NA NA NA 46.226100002 -2.186 260.414 1 44.251 2.053 -2.028 NA NA NA 44.251100003 0.06 86.062 0 NA NA NA NA NA NA 0
EventId Weight Label100000 0.00265331133733s100001 2.23358448717b100002 2.34738894364b100003 5.44637821192b
Data loaded correctly Notice NA values
MISSING VALUES
21
col_name NA_count NA_pct1 EventId 2 DER_mass_MMC 38,114 15%3 DER_mass_transverse_met_lep 4 DER_mass_vis 5 DER_pt_h 6 DER_deltaeta_jet_jet 177,457 71%7 DER_mass_jet_jet 177,457 71%8 DER_prodeta_jet_jet 177,457 71%9 DER_deltar_tau_lep 10 DER_pt_tot 11 DER_sum_pt 12 DER_pt_ratio_lep_tau 13 DER_met_phi_centrality 14 DER_lep_eta_centrality 177,457 71%15 PRI_tau_pt 16 PRI_tau_eta 17 PRI_tau_phi 18 PRI_lep_pt 19 PRI_lep_eta 20 PRI_lep_phi 21 PRI_met 22 PRI_met_phi 23 PRI_met_sumet 24 PRI_jet_num 25 PRI_jet_leading_pt 99,913 40%26 PRI_jet_leading_eta 99,913 40%27 PRI_jet_leading_phi 99,913 40% 28 PRI_jet_subleading_pt 177,457 71%29 PRI_jet_subleading_eta 177,457 71%30 PRI_jet_subleading_phi 177,457 71%31 PRI_jet_all_pt 32 Weight 33 Label
MISSING VALUES
22
col_name NA_count NA_pct1 EventId 2 DER_mass_MMC 38,114 15%3 DER_mass_transverse_met_lep 4 DER_mass_vis 5 DER_pt_h 6 DER_deltaeta_jet_jet 177,457 71%7 DER_mass_jet_jet 177,457 71%8 DER_prodeta_jet_jet 177,457 71%9 DER_deltar_tau_lep 10 DER_pt_tot 11 DER_sum_pt 12 DER_pt_ratio_lep_tau 13 DER_met_phi_centrality 14 DER_lep_eta_centrality 177,457 71%15 PRI_tau_pt 16 PRI_tau_eta 17 PRI_tau_phi 18 PRI_lep_pt 19 PRI_lep_eta 20 PRI_lep_phi 21 PRI_met 22 PRI_met_phi 23 PRI_met_sumet 24 PRI_jet_num 25 PRI_jet_leading_pt 99,913 40%26 PRI_jet_leading_eta 99,913 40%27 PRI_jet_leading_phi 99,913 40%28 PRI_jet_subleading_pt 177,457 71%29 PRI_jet_subleading_eta 177,457 71%30 PRI_jet_subleading_phi 177,457 71%31 PRI_jet_all_pt 32 Weight 33 Label
Notice the consistency in missing values
• Assign a value
–Generate a random value
– Fit a value (mean, median, nearest neighbor, etc.)
– Fix a value (domain knowledge)
• Remove the record
• Leave as is
HOW TO HANDLE MISSING VALUES
23
• Assign a value
–Generate a random value
– Fit a value (mean, median, nearest neighbor, etc.)
– Fix a value (domain knowledge)
• Remove the record
• Leave as is
HOW TO HANDLE MISSING VALUES
24
HISTOGRAM
25Density is more meaningful in the range of x No fuzzy jump at the edge
PRI_jet_leading_pt
Coun
t
Log transformation
Coun
t
Inverse transformation
Coun
t
HISTOGRAM (CONT’D)
26Bi-‐modality is revealed
DER_pt_h
Coun
t
Log transformation
Coun
t
Inverse transformation
Coun
t
INTERACTIVE VISUALIZATION R SHINY
27http://chencheng.shinyapps.io/demo_higgsDEMO
INTERACTIVE VISUALIZATION R SHINY
28http://chencheng.shinyapps.io/demo_higgsDEMO
INTERACTIVE VISUALIZATION R SHINY
29
Use a reasonable number of bins to display the underlying distribution
http://chencheng.shinyapps.io/demo_higgsDEMO
INTERACTIVE VISUALIZATION R SHINY
30
Use a reasonable transformation to display the underlying distribution
http://chencheng.shinyapps.io/demo_higgsDEMO
HISTOGRAM (CONT’D)
31
Coun
t
PRI_tau_etaTransformations are sometimes not necessary
32
Do that for all 30 variables
PAIRWISE CORRELATIONS
33
Coun
t
Count
BKG
SGN
PRI_lep_phi & PRI_met_phi
PAIRWISE CORRELATIONS
34
Coun
t
CountSet transparency parameter appropriately to reveal important patterns
BKG
SGN
PRI_lep_phi & PRI_met_phi
PAIRWISE CORRELATIONS
35
Coun
t
CountCorrelation coefficient == 0 does not mean no correlation
BKG
SGN
PRI_lep_phi & PRI_met_phi
PAIRWISE CORRELATIONS
36
Coun
t
Count
BKG
SGN
PRI_lep_phi & PRI_met_phi
FEATURE ENHANCEMENT ROTATION
37Validate visual “evidence” from various perspectives
BKG
SGN
rotated PRI_lep_phi & PRI_met_phi
FEATURE ENHANCEMENT ROTATION
38Validate visual “evidence” from various perspectives
BKG
SGN
rotated PRI_lep_phi & PRI_met_phi
PAIRWISE VARIABLES — LOW RES.
39
Coun
t
Count
BKG
SGN
DER_pt_h & DER_deltar_tau_lep
PAIRWISE VARIABLES — HIGH RES.
40Try High Resolution
Coun
t
Count
BKG
SGN
DER_pt_h & DER_deltar_tau_lep
PAIRWISE VARIABLES — HIGH RES.
41Curve fitting
Coun
t
Count
BKG
SGN
DER_pt_h & DER_deltar_tau_lep
FEATURE ENHANCEMENT CURVE FITTING
42Enhance a variable based on correlation with another variable
Coun
t
Count
BKG
SGN
DER_pt_h & DER_deltar_tau_lep
FEATURE ENHANCEMENT ROTATION BY PRI_TAU_PHI
43
Domain Knowledge
Coun
t
Count
BKG
SGN
DER_pt_h & PRI_lep_phi
FEATURE ENHANCEMENT ROTATION BY PRI_TAU_PHI
44Feature enhancement by applying domain knowledge
Coun
t
Count
BKG
SGN
DER_pt_h & PRI_lep_phi
Domain Knowledge
FEATURE ENHANCEMENT ROTATION
45
Coun
t
Count
BKG
SGN
PRI_jet_leading_eta & PRI_jet_subleading_eta
• Select variable(s): One var. for histogram, two var. for scatter plot
DATA DRILL DOWN
46http://chencheng.shinyapps.io/demo_higgsDEMO
• Dynamically select a subset of data — PRI_jet_num = 2
DATA DRILL DOWN
47http://chencheng.shinyapps.io/demo_higgsDEMO
• Patterns in the subset data — PRI_jet_leading_eta & PRI_jet_subleading_eta
DATA DRILL DOWN
48http://chencheng.shinyapps.io/demo_higgsDEMO
• Dynamically select a subset of data — PRI_jet_num = 3
DATA DRILL DOWN
49http://chencheng.shinyapps.io/demo_higgsDEMO
• Patterns in the subset data — PRI_jet_leading_eta & PRI_jet_subleading_eta
DATA DRILL DOWN
50http://chencheng.shinyapps.io/demo_higgsDEMO
• Patterns in the subset data — PRI_jet_leading_eta & PRI_jet_subleading_eta
DATA DRILL DOWN
51
PRI_jet_num = 2 PRI_jet_num = 3
Interactive data visualization techniques are helpful
http://chencheng.shinyapps.io/demo_higgsDEMO
52
Do that for all 30 * 29 ~= 900 pairs
PARTICLE LOCATION — (0, S)
53
Convert numerical data back into actual object with meaning
Animation
PARTICLE LOCATION — (0, B)
54
Animation
• Distance ratio between MET-‐Lep and Tau-‐Lep
d(MET, Lep)/d(Tau, Lep)
INSPIRATION FROM ANIMATION
55
Inspiration from meaningful visualization can be helpful
Coun
t
dist_ratio_met_lep_tau
BKG
SGN
• Distance ratio between MET-‐Lep and Tau-‐Lep
d(MET, Lep)/d(Tau, Lep)
BKG
SGN
INSPIRATION FROM ANIMATION
56
Adjust visualization for better efficiency
Coun
t
dist_ratio_met_lep_tau
Coun
t
dist_ratio_met_lep_tau
BKG
SGN
• Variable reduction
– Simple rotation
– Transformation
–Domain knowledge
–…
• Feature generation
–Domain knowledge
– Inspiration from various visualizations
– Statistical approaches
–…
FEATURE ENHANCEMENT
57
Principle component analysis
distance_ratio
Rotation by phiCurve fitting
45 degree rotation
Background
58
Data
Model
Understand
Explore Enhance
Train Select Optimize
read
visualize
reduce
generate
innovateapply
fine-‐tune
read
discuss
Validate
find
cross validate
©
• Gradient boosting tree
• Neural network
• Bayesian network
• Support vector machine
• Generalized additive model
MODELS
59
• Gradient boosting tree
• Neural network
• Bayesian network
• Support vector machine
• Generalized additive model
MODELS
60
• Decision tree
–Build many shallow trees
• Boosting
–Build trees based on residual
• Bagging
– Each tree uses a subset of the data
• Ensembling
–Combine the trees
GRADIENT BOOSTING TREE
61
• Decision tree
–Build many shallow trees
• Boosting
–Build trees based on residual
• Bagging
– Each tree uses a subset of the data
• Ensembling
–Combine the trees
GRADIENT BOOSTING TREE
62
• Regression tree
DECISION TREE
63
−1.0
−0.5
0.0
0.5
1.0
0.0 2.5 5.0 7.5 10.0x
y
• Regression tree
DECISION TREE
64
−1.0
−0.5
0.0
0.5
1.0
0.0 2.5 5.0 7.5 10.0x
y
|
x< 6.614x>=6.614
0.19n=100
−0.08n=64
0.66n=36
Regression Tree with Node Depth = 1
Depth = 1
• Regression tree
DECISION TREE
65
|
x< 6.614
x>=3.049 x>=8.953
x>=6.614
x< 3.049 x< 8.953
0.19n=100
−0.08n=64
−0.53n=40
0.67n=24
0.66n=36
0.086n=7
0.8n=29
Regression Tree with Node Depth = 2
−1.0
−0.5
0.0
0.5
1.0
0.0 2.5 5.0 7.5 10.0x
y
Depth = 2
• Regression tree
DECISION TREE
66
|
x< 6.614
x>=3.049
x< 5.862
x>=8.953
x< 7.207
x>=6.614
x< 3.049
x>=5.862
x< 8.953
x>=7.207
0.19n=100
−0.08n=64
−0.53n=40
−0.67n=32
0.045n=8
0.67n=24
0.66n=36
0.086n=7
0.8n=29
0.57n=7
0.87n=22
Regression Tree with Node Depth = 3
−1.0
−0.5
0.0
0.5
1.0
0.0 2.5 5.0 7.5 10.0x
y
Depth = 3
• Regression tree
DECISION TREE
67
|
x< 6.614
x>=3.049
x< 5.862
x>=3.594
x>=8.953
x< 7.207
x>=6.614
x< 3.049
x>=5.862
x< 3.594
x< 8.953
x>=7.207
0.19n=100
−0.08n=64
−0.53n=40
−0.67n=32
−0.8n=25
−0.23n=7
0.045n=8
0.67n=24
0.66n=36
0.086n=7
0.8n=29
0.57n=7
0.87n=22
Regression Tree with Node Depth = 4
−1.0
−0.5
0.0
0.5
1.0
0.0 2.5 5.0 7.5 10.0x
y
Depth = 4
X0 = X; Y0 = Y;
latest_model = train_tree(X, Y);
for ii = 1:NUM_ITER
Index_train = random(1:NUM_REC, FRAC_TRAIN * NUM_REC)
X = X0[Index_train]; Y = Y0[Index_train];
v_resid = Y -‐ wts * latest_model(X);
tree(ii) = train_tree(X, v_pseudo_resid, wts);
latest_model += LARNING_RATE * tree(ii)
DECISION TREE
68
base model
X0 = X; Y0 = Y;
latest_model = train_tree(X, Y);
for ii = 1:NUM_ITER
Index_train = random(1:NUM_REC, FRAC_TRAIN * NUM_REC)
X = X0[Index_train]; Y = Y0[Index_train];
v_resid = Y -‐ latest_model(X);
tree_add= train_tree(X, v_resid);
latest_model += LARNING_RATE * tree_add
GRADIENT BOOSTING TREE (V. 1)
69
get the residuals
fit a tree for residuals
additive model
X0 = X; Y0 = Y;
latest_model = train_tree(X, Y);
for ii = 1:NUM_ITER
Index_train = random(1:NUM_REC, FRAC_TRAIN * NUM_REC)
X = X0[Index_train]; Y = Y0[Index_train];
v_resid = Y -‐ latest_model(X);
tree_add = train_tree(X, v_resid);
latest_model += LARNING_RATE * tree_add
(STOCHASTIC) GRADIENT BOOSTING TREE
70
get sampled index
sampled records as input
store input
X0 = X; Y0 = Y;
latest_model = train_tree(X, Y, wts);
for ii = 1:NUM_ITER
Index_train = random(1:NUM_REC, FRAC_TRAIN * NUM_REC)
X = X0[Index_train]; Y = Y0[Index_train];
v_resid = Y -‐ wts * latest_model(X);
tree_add = train_tree(X, v_resid, wts);
latest_model += LARNING_RATE * tree_add
(STOCHASTIC) GRADIENT BOOSTING TREE WITH WEIGHT
71
X0 = X; Y0 = Y;
latest_model = train_base_model(X, Y, wts);
for ii = 1:NUM_ITER
Index_train = random(1:NUM_REC, FRAC_TRAIN * NUM_REC)
X = X0[Index_train]; Y = Y0[Index_train];
v_pseudo_resid = get_pseudo_residual(X, Y, wts, latest_model, LOSS_FUNCTION_TYPE);
model_add_base = train_base_model(X, v_pseudo_resid, wts);
alpha = linear_search(cost_function, model_add_base, X, Y, wts);
latest_model += LARNING_RATE * (alpha * model_add_base)
(GENERAL) GRADIENT BOOSTING
72
[Stochastic Gradient Boosting] Jerome H. Friedman, 1999
Background
73
Data
Model
Understand
Explore Enhance
Train Select Optimize
read
visualize
reduce
generate
innovateapply
fine-‐tune
read
discuss
Validate
find
cross validate
©
gbm_model = gbm.fit(
x=train[,x_vars, with = FALSE],
y=train$Label,
distribution = char_distr,
w = w,
n.trees = n_trees,
interaction.depth = num_inter,
n.minobsinnode = min_obs_node,
shrinkage = shrinkage_rate,
bag.fraction = frac_bag)
APPLYING GBM IN R
74
VARIABLE IMPORTANCE
75Relative Importance
APPLY MODEL ON TEST DATA
76
EventId Score RankOrder Class
1 0.98 501 s
2 0.42 259,579 b
3 0.46 264,125 b
. . . .
. . . .
449,998 0.86 31,154 s
449,999 0.12 489,251 b
550,000 0.79 110,154 b
Background
77
Data
Model
Understand
Explore Enhance
Train Select Optimize
read
visualize
reduce
generate
innovateapply
fine-‐tune
read
discuss
Validate
find
cross validate
• Number of iteration
• Minimum observation for each node
• Fraction of bagging (0.5 ~ 0.8)
• Learning rate (<0.1)
• Depth of tree (4 ~ 8)
GRADIENT BOOSTING PARAMETERS
78
Background
79
Data
Model
Understand
Explore Enhance
Train Select Optimize
read
visualize
reduce
generate
innovateapply
fine-‐tune
read
discuss
Validate
find
cross validate
• Split training data
– 70% for training
– 30% for cross validation
• Train model (70%)
• Measure performance (30%)
CROSS VALIDATION
80
PERFORMANCE BASED ON AMS
81
Trade-‐off between: Ratio of Signal/Background events Number of records in selection region
EventId Score RankOrder
Class truth
1 0.98 501 S S
2 0.42 259,579 B
3 0.46 264,125 B
. . . .
. . . .
449,998 0.86 31,154 S B
449,999 0.12 489,251 B
550,000 0.79 110,154 B
Selection Region
s = sum(S) b= sum(B)
PERFORMANCE BASED ON AMS
82
Percentile
AMS
AMS
percentage of signal
COMPARE TWO MODEL RESULTS
Percentile
83
Training
Cross validation
Percentile
AMS
AMS
percentage of signal
Percentile
84
COMPARE TWO MODEL RESULTS
Training
Cross validation
Percentile
AMS
AMS
percentage of signal
AMS BY NUM. ITERATION
85
Percentile
AMS
Animation
Background
86
Data
Model
Understand
Explore Enhance
Train Select Optimize
read
visualize
reduce
generate
innovateapply
fine-‐tune
read
discuss
Validate
find
cross validate
s
b
>> 4
HEAT MAP OF AMS ON B-‐S PLAN
87
OPTIMIZATION BASED ON OBJECTIVE FUNCTION
Percentile
88
A
B
C
AMS
HEAT MAP OF AMS ON B-‐S PLAN
89
s
b
A
B
C
HEAT MAP OF AMS ON B-‐S PLAN
90
s
b
A
B
C
Inspiration from Lagrangian Method Weight signal and background events by partial derivatives of AMS function
AMS CURVE ON B-‐S PLAN
91
A
B
C
Inspiration from Lagrangian Method Weight signal and background events by partial derivatives of AMS function
s
b
partial derivative of AMS against s
partial derivative of AMS against b
Ratio of the derivatives ==> relative weight
IMPROVEMENT DUE TO WEIGHTING
92
AMS*
Num_Iterations
AMS
IMPROVEMENT DUE TO WEIGHTING (CONT’D)
93Num_Iterations
AMS*
AMS
AUGMENTED GRADIENT BOOSTING
94
Apply GBMWeight
Adjustment
©
AUGMENTED GRADIENT BOOSTING
95
Apply GBMWeight
Adjustment
Remove very high and very low score records
from train and test
©
IMPROVEMENT DUE TO ELIMINATION
96Num_Iterations
AMS*
AMS
IMPROVEMENT DUE TO ELIMINATION (CONT’D)
97Num_Iterations
AMS*
AMS
AUGMENTED GRADIENT BOOSTING
98
Apply ML
Model
Weight Adjustment
Remove very high and very low score records
from train and test
©
Background
99
Data
Model
Understand
Explore Enhance
Train Select Optimize
read
visualize
reduce
generate
innovateapply
fine-‐tune
read
discuss
Validate
find
cross validate
• Version control (Git, Source Tree)
– Effectively implement many different ideas
• File organization
– Efficiently pull out the file needed
• Effective code (R, Python)
– it matters so much when dealing with big data
OTHER TOPICS
100
Thanks you for your participation!
Any Questions?
goDCI.com