satnam singh talk feb5 indian analytics and big data summit
TRANSCRIPT
![Page 1: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/1.jpg)
Short Stories of Building
Data Science Products
February 5, 2015
Satnam Singh, PhD Data Scientist/Director, CA Technologies
![Page 2: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/2.jpg)
Rocket Singh
Image credit: Movie Makers of Rocket Singh
- Salesman of the Year
- Wanted to become a Data Scientist
![Page 3: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/3.jpg)
Dialog with Rocket Singh
Who is a Data Scientist?
![Page 4: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/4.jpg)
“A Story Teller &…
Data Scientist
![Page 5: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/5.jpg)
“An Expert Software Coder &…
Data Scientist
![Page 6: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/6.jpg)
“A Data lake/ocean Swimmer &…
Data Scientist
![Page 7: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/7.jpg)
Data Scientist
“A Statistician &…
![Page 8: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/8.jpg)
“An Business Savvy Engineer…
Data Scientist
![Page 9: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/9.jpg)
Dialog with Rocket Singh
How do I become a
Data Scientist?
Satnam – Becoming a
Data Scientist is
a fantastic and long
Journey
![Page 10: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/10.jpg)
++ Big Data Processing Skills
Data Scientist Skills
![Page 11: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/11.jpg)
Dialog with Rocket Singh
Can you tell me about a
data science product that
you have built?
Satnam – Many stories, here are
two stories:
1. Smartphones – story
2. Automobile - story
![Page 12: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/12.jpg)
12
Sensor Data
- Recommendations - User Modeled Activities - Personalization
Social data
User data …
Analytics (Text Mining, Machine Learning, Data Mining)
Sensor Data
User Data
Social Data
3rd Party Applications, Native Applications
Smart Analytics in Smartphones
![Page 13: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/13.jpg)
13
Smart Gallery
Image Credit: Photo
Organizer Fish Bowl
Smart Grouping
![Page 14: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/14.jpg)
14
Smart Contacts
Image Credit: Contact+
Duplicate Names “Satnam Singh” , “Satnam Singh,
PhD”, “Singh Sat”, “Satnam Bro”
Marie Brown
Brown Marie
Frnd
Marie B Bow
Problem: Find duplicate
contacts and compare your
algorithm with existing
technique in Android
![Page 15: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/15.jpg)
15
Algo 1 in Android: a variant of largest common
substring match
Singh Algo: Lexical similarity for in-device
implementation and locality sensitive hashing (LSH)
for cloud implementation
Solve Duplicate Contacts Problem
![Page 16: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/16.jpg)
16
Data Variety and Data Fusion is “Hidden Treasure”
Image credit: Denise Lu
Datablending
![Page 17: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/17.jpg)
![Page 18: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/18.jpg)
Motivation: Field Failure Data for QRD Enhancement
18
GM’s Databases
Data collection
via data link
in service shops
Data collection via
Telematics
Field failure data: Diagnostic Trouble
Codes (DTCs), Operating Parameter
Identifiers (PIDs), Warranty Claims, etc.
OEM Databases
Test fleet,
Production fleet
Advanced
Analytical
Toolset
![Page 19: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/19.jpg)
DTC
Design DTC
Software
Research
Parameter Identifiers (PIDs) – e.g. Engine
speed, vehicle speed, powertrain
voltage, environmental parameters, etc.
OEM’sDatabase
1. Data collection viadiagnostic toolsat dealer shops
Automobilefield failure data
1. Data collection viatelematics services
Data: DTCs, PIDs, Claims, etc.
PIDs data
Scan tools at
Dealer shops
400-600
PIDs
10-12 Unique
PIDs
SME
analyses and
test vehicle
validation
Business Case and Problem
DTC: Diagnostic Trouble Code (fault code)
Design and software enhancement
![Page 20: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/20.jpg)
Reduce
NTFs
Characterize
Intermittent
faults
Enhance
DTC design
1. Field Failure
Data
2. Data
Transformation
3. Study PID
Distributions
5. PIDs
selection
Decision Trees
4. Analyze PID
Correlations
Filter
Preprocessing
Data
User Data
Selection
Tool Use for
Innovation and Business Impact:
PIDs Mining Tool
20
![Page 21: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/21.jpg)
Problem Formulation and Solution
Feature
Selection and
Classification
Problem
400-600
Attributes
(PIDs)
10-20
Informative
Attributes
(PIDs)
},x,...,x,x,x{),( 321 yYX n
ix are attribute vectors (i.e. PIDs data) for m number of patterns (i.e. vehicles)
y is class variable (i.e. faults) if y=0 (baseline), y=1 (intermittent fault)
Problem: Find Informative features that separates intermittent
class from baseline class infX
21
![Page 22: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/22.jpg)
Problem Formulation and Solution
22
)()()(),(1
m
i
AiS iSHApSHASIG
n
j
SS jpjpSH1
2 )(log)()(
Solution: Decision tree-based method (C4.5 Algorithm) using Entropy criterion
to select the select the attribute Ai in decision tree
Stopping Criterion to stop the tree growth: Minimum no. of patterns (i.e. vehicles)
should be greater than Nmin (e.g. 10)
H(S): Entropy of the set S
IG(S, A): Gain in Entropy of the set S
after Split on Attribute A
![Page 23: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/23.jpg)
Entropy Decision Forest
PIDs data for
Intermittent fault vs.
Baseline
Tree1
D1
Remove PIDs
that are in Tree
1 from input
data of Tree 2
Tree2
D2
Remove PIDs that
are in Tree 2 from
input data of Tree 3
Continue till
stopping
criteria met
...
Hypothesis 1 Hypothesis 2 Hypothesis n
23
![Page 24: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/24.jpg)
Entropy Decision Forest
24
Decision Tree 1
Decision Forest
![Page 26: Satnam Singh Talk Feb5 Indian Analytics and Big Data Summit](https://reader033.vdocuments.us/reader033/viewer/2022052913/55b76cf6bb61eb32248b4643/html5/thumbnails/26.jpg)
Data Science Conference in Bangalore: March 18-21