blind spots in big data erez koren @ forter

Post on 12-Apr-2017

76 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Blind Spots in BIG DATA

Erez KorenForter

About myself - Erez Koren

In the computer business since 2nd grade

Love building products and hacking stuff

Currently in my 3rd startup adventure

Working at Forter from before day one

2017

About Forter

We catch Fraudsters & protect E-commerce merchants

Founded 3.5 years ago

~80 employees worldwide

Backed by

2017

We detect fraud, give a real-time decision (approve/decline) every time and guarantee it (chargeback protection). Covering the whole customer lifecycle

We collect data from browsers (JS) and mobile apps (through SDK )

We also receive order/account data S2S into our API and reply with our decision in real-time

Our stack:

Forter - What & How We Do It

2017

Compliance:

And more...

The big data infrastructure...

2017

But you have a feeling that something is wrong

Are you sure the data contains everything you need?

How do you ensure the quality of your data?

2017

The COVERAGE challengeIn some cases the data you are analyzingis only partial

Today’s internet is a jungle.

There are thousands of devices, platforms, browsers and configurations.

Are you sure you are collecting data from all / most of the relevant sources?

2017

The COVERAGE Challenge

Demo timeThis is how we do it

2017

11

MULTIPLE DEVICES & PLATFORMS

12

MULTIPLE VERSIONS, INCLUDING DEV. VERSIONS

13

SENDING EVENTS FROM 25 DIFFERENT CONFIGS

14

SELENIUM TESTS COVERS FULL CHECKOUT EXPERIENCE

15

IN REAL WORLD SOME OF THE TESTS ALWAYS FAIL

16

EXAMPLE FOR UNEXPECTED DATA IN REAL WORLD

ChormeSafariMobile SafariFirefoxIEAndroid BrowserEdgeChrome WebViewPhantomJSundefinedOperaWebKit

17

Detect exceptions that occurs on client sideBrowsers (JS), Mobile SDKs and any other client integrations

CLIENT SIDE CODE MONITORING

18

JS SCRIPT TIMEOUTS

Merchant checked the website with a browser that is not supporting javascript

Detect gaps between script request from server and script events received

Compare the data segments of the

general population versus the data

segment spread in your data

Test it as if you were a real user

Even if everything is working now, in

the future it will not

Takeaways

2017

The MONITORING Challenge

2017

The MONITORING Challenge

Is “measuring everything” good enough?

How often are you checking the graphs?

Do you have enough alerts or too many?

There are always technical issues that can corrupt the alerting data

Demo 2 timeThis is how we do it

2017

23

API AVAILABILITY CHECK

External monitoring (watch the watcher), including round-tripPingdom and StatusCake

24

DEPENDENCIES MONITORING (RSS)

e.g. AWS, GitHub

Reported to our #productionroom in slack

25

API RESPONSES ANOMALY DETECTION

Detect decline increase from X% to Y% in a given time window

26

1. Making sure we don’t slow the site down, or impact checkout funnel via automated Selenium tests (with & without our script, multiple browsers)

2. Incremental deployment support for

JS SCRIPT MONITORING

27

ML FEATURES ANOMALY DETECTION

Monitoring system’s healthby measuring our MachineLearning featuresdistribution over time

28

VULNERABILITIES MONITORING (RSS)

OS, databases, libraries etc.

29

ALERTS DAILY SUMMARY

Alerts summary of in the last 24h + ability to drill the graphs

2017

Takeaways

Make sure every alert can be drilled down into a graph and relate to the raw metric

Know how to investigate - leave breadcrumbs to raw data (even when the data is aggregated)

Differentiate between critical alerts and other alerts (that can be fixed the next morning)

Measure low values as well as the high ones - alerts for low values (e.g. CPU) is something that most systems are missing

2017

Takeaways

Understand the pipes and filters make sure there are no hidden blockages in the data pipelines

Log errors both from client side and server side when possible and analyze together

Make sure incidents that affect input data are shared with your data scientists by using “dirty” or “partial” flag

Thank you !

2017

top related