blind spots in big data erez koren @ forter
Post on 12-Apr-2017
76 Views
Preview:
TRANSCRIPT
Blind Spots in BIG DATA
Erez KorenForter
About myself - Erez Koren
In the computer business since 2nd grade
Love building products and hacking stuff
Currently in my 3rd startup adventure
Working at Forter from before day one
2017
About Forter
We catch Fraudsters & protect E-commerce merchants
Founded 3.5 years ago
~80 employees worldwide
Backed by
2017
We detect fraud, give a real-time decision (approve/decline) every time and guarantee it (chargeback protection). Covering the whole customer lifecycle
We collect data from browsers (JS) and mobile apps (through SDK )
We also receive order/account data S2S into our API and reply with our decision in real-time
Our stack:
Forter - What & How We Do It
2017
Compliance:
And more...
The big data infrastructure...
2017
But you have a feeling that something is wrong
Are you sure the data contains everything you need?
How do you ensure the quality of your data?
2017
The COVERAGE challengeIn some cases the data you are analyzingis only partial
Today’s internet is a jungle.
There are thousands of devices, platforms, browsers and configurations.
Are you sure you are collecting data from all / most of the relevant sources?
2017
The COVERAGE Challenge
Demo timeThis is how we do it
2017
11
MULTIPLE DEVICES & PLATFORMS
12
MULTIPLE VERSIONS, INCLUDING DEV. VERSIONS
13
SENDING EVENTS FROM 25 DIFFERENT CONFIGS
14
SELENIUM TESTS COVERS FULL CHECKOUT EXPERIENCE
15
IN REAL WORLD SOME OF THE TESTS ALWAYS FAIL
16
EXAMPLE FOR UNEXPECTED DATA IN REAL WORLD
ChormeSafariMobile SafariFirefoxIEAndroid BrowserEdgeChrome WebViewPhantomJSundefinedOperaWebKit
17
Detect exceptions that occurs on client sideBrowsers (JS), Mobile SDKs and any other client integrations
CLIENT SIDE CODE MONITORING
18
JS SCRIPT TIMEOUTS
Merchant checked the website with a browser that is not supporting javascript
Detect gaps between script request from server and script events received
Compare the data segments of the
general population versus the data
segment spread in your data
Test it as if you were a real user
Even if everything is working now, in
the future it will not
Takeaways
2017
The MONITORING Challenge
2017
The MONITORING Challenge
Is “measuring everything” good enough?
How often are you checking the graphs?
Do you have enough alerts or too many?
There are always technical issues that can corrupt the alerting data
Demo 2 timeThis is how we do it
2017
23
API AVAILABILITY CHECK
External monitoring (watch the watcher), including round-tripPingdom and StatusCake
24
DEPENDENCIES MONITORING (RSS)
e.g. AWS, GitHub
Reported to our #productionroom in slack
25
API RESPONSES ANOMALY DETECTION
Detect decline increase from X% to Y% in a given time window
26
1. Making sure we don’t slow the site down, or impact checkout funnel via automated Selenium tests (with & without our script, multiple browsers)
2. Incremental deployment support for
JS SCRIPT MONITORING
27
ML FEATURES ANOMALY DETECTION
Monitoring system’s healthby measuring our MachineLearning featuresdistribution over time
28
VULNERABILITIES MONITORING (RSS)
OS, databases, libraries etc.
29
ALERTS DAILY SUMMARY
Alerts summary of in the last 24h + ability to drill the graphs
2017
Takeaways
Make sure every alert can be drilled down into a graph and relate to the raw metric
Know how to investigate - leave breadcrumbs to raw data (even when the data is aggregated)
Differentiate between critical alerts and other alerts (that can be fixed the next morning)
Measure low values as well as the high ones - alerts for low values (e.g. CPU) is something that most systems are missing
2017
Takeaways
Understand the pipes and filters make sure there are no hidden blockages in the data pipelines
Log errors both from client side and server side when possible and analyze together
Make sure incidents that affect input data are shared with your data scientists by using “dirty” or “partial” flag
Thank you !
2017
top related