machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... ·...
TRANSCRIPT
![Page 1: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/1.jpg)
Machine learning for machine data
1
David Andrzejewski -‐ @davidandrzej Data Sciences, Sumo Logic Strata Conference – Machine Data Track February 13, 2014
![Page 2: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/2.jpg)
This talk: Machine Learning + Machine Data = Awesome!
2
! YES – overview of log data – solving log data problems with machine learning – specific examples
• (mostly) Sumo Logic-‐related • customer use cases
– general lessons learned
![Page 3: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/3.jpg)
This talk: Machine Learning + Machine Data = Awesome!
3
! NO (or, not much) – Sumo Logic deep dive – Tech stack talk
• In-‐memory Hadoop for real-‐Ume Cassandra SQL in hybrid clouds
– Big data “shock and awe” • 800 yo[abytes / second ZOMG!!11!!
– Algorithm shootout • Deep learning vs random forests vs SVMs vs coin flips vs ...
– Extreme math
HyperLogLog: analysis of a near-optimal cardinality algorithm 131
Definition 1 An ideal multiset of cardinality n is a sequence obtained by arbitrary replications and per-mutations applied to n uniform identically distributed random variables over the real interval [0, 1].In the analytical part of our paper (Sections 2 and 3), we postulate that the collection of hashed valuesh(M), which the algorithm processes constitutes an ideal multiset. This assumption is a natural way tomodel the outcome of well designed hash functions. Note that the number of distinct elements of such anideal multiset equals n with probability 1. We henceforth let E
n
and Vn
be the expectation and varianceoperators under this model.
Theorem 1 Let the algorithm HYPERLOGLOG of Figure 2 be applied to an ideal multiset of (unknown)cardinality n, using m � 3 registers, and let E be the resulting cardinality estimate.
(i) The estimate E is asymptotically almost unbiased in the sense that
1
nE
n
(E) =
n!11 + �
1
(n) + o(1), where |�1
(n)| < 5 · 10
�5 as soon as m � 16.
(ii) The standard error defined as 1
n
pV
n
(E) satisfies as n!1,
1
n
pV
n
(E) =
n!1
�mpm
+ �2
(n) + o(1), where |�2
(n)| < 5 · 10
�4 as soon as m � 16,
the constants �m
being bounded, with �16
.= 1.106, �
32
.= 1.070, �
64
.= 1.054, �
128
.= 1.046, and �1 =p
3 log(2)� 1
.= 1.03896.
The standard error measures in relative terms the typical error to be observed (in a mean quadraticsense). The functions �
1
(n), �2
(n) represent oscillating functions of a tiny amplitude, which are com-putable, and whose effect could in theory be at least partly compensated—they can anyhow be safelyneglected for all practical purposes.
Plan of the paper. The bulk of the paper is devoted to the proof of Theorem 1. We determine theasymptotic behaviour of E
n
(Z) and Vn
(Z), where Z is the indicator 1/P
2
�M
(j). The value of ↵
m
inEquation (3), which makes E an asymptotically almost unbiased estimator, is derived from this analysis,as is the value of the standard error. The mean value analysis forms the subject of Section 2. In fact,the exact expression of E
n
(Z) being hard to manage, we first “poissonize” the problem and examineEP(�)
(Z), which represents the expected value of the indicator Z when the total number of elements is notfixed, but rather obeys a Poisson law of parameter �. We then prove that, asymptotically, the behaviours ofE
n
(Z) and EP(�)
(Z) are close, when one chooses � := n: this is the depoissonization step. The varianceanalysis of the indicator Z, hence of the standard error, is sketched in Section 3 and is entirely parallel tothe mean value analysis. Finally, Section 4 examines how to implement the HYPERLOGLOG algorithmin real-life contexts, presents simulations, and discusses optimality issues.
2 Mean value analysis
Our starting point is the random variable Z (the “indicator”) defined in (2). We recall that En
refers toexpectations under the ideal multiset model, when the (unknown) cardinality n is fixed. The analysisstarts from the exact expression of E
n
(Z) in Proposition 1, continues with an asymptotic analysis of thecorresponding Poisson expectation summarized by Proposition 2, and concludes with the depoissonizationargument of Proposition 3.
![Page 4: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/4.jpg)
! Data sciences @ Sumo Logic ! Co-‐organizer @ SF ML Meetup ! Previous
– Post-‐doc in knowledge discovery
! Even more previous machine data research projects
– University of Wisconsin-‐-‐Madison – Microsog Research
Context: me
4
−1
0
1
−0.6−0.4−0.200.20.40.60.8
−1
−0.5
0
0.5
PCA3
PCA2PCA1
254 bug1 runs106 bug3 runs147 bug4 runs329 bug5 runs206 bug8 runs186 other runs
![Page 5: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/5.jpg)
Context: Sumo Logic
5
“Turning Machine Data Into IT and Business Insights”
Learn, classify, predict
Search, monitor, visualize
![Page 6: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/6.jpg)
Context: Sumo Logic
6
“Turning Machine Data Into IT and Business Insights”
Learn, classify, predict
Search, monitor, visualize
![Page 7: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/7.jpg)
Anatomy of a log message: Five W’s
7
![Page 8: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/8.jpg)
Anatomy of a log message: Five W’s
8
! When? Timestamp with Ume zone
![Page 9: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/9.jpg)
Anatomy of a log message: Five W’s
9
! When? Timestamp with Ume zone ! Where? Host, module, code locaUon
![Page 10: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/10.jpg)
Anatomy of a log message: Five W’s
10
! When? Timestamp with Ume zone ! Where? Host, module, code locaUon ! Who? AuthenUcaUon context
![Page 11: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/11.jpg)
Anatomy of a log message: Five W’s
11
! When? Timestamp with Ume zone ! Where? Host, module, code locaUon ! Who? AuthenUcaUon context ! What? Log level and key-‐value pairs
![Page 12: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/12.jpg)
What’s missing
12
![Page 13: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/13.jpg)
Traversing the stack
Sumo Logic ConfidenUal 13
Custom App Code
Server / OS
VirtualizaUon
Databases
Network
Open Source Sogware
Middleware
12/20/2011 17:23:44 PST [user=234fsf] failed transaction, sessionid:2F0A232324, [host=pay002.sjc] amount=1725.00
12/20/11 17:23:34 AMQ7163: WebSphere MQ job number 18429 started FOR client_session=2F0A232324.
12202011 17:23:27 /usr/local/build/mysql/libexec/mysqld: Abnormal shutdown [18429]
20-12-2011 17:23:19 database-host login[3866]: DEAD_PROCESS: 18429 ttys000
Dec 20, 2011 17:22:14,,, message=Created virtual machine user-3 on esxi01.office.thedomain.com
<134>Dec 20 2011 17:22:12: %PIX-6-106100: access-list inside_access_out denied tcp inside/68.162.72.163(4326) -> outside/45.200.244.124(3127) hit-cnt 1(first hit)
66.249.67.24 - - [20/Dec/2011:17:23:40 -0700] ”POST /APP/Order.php HTTP/1.1" 304 146 "-" SESSION=2F0A232324
Customer ID Session ID
Job number
Process ID
Root cause!
![Page 14: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/14.jpg)
Enhanced visibility into machine behaviors ! Compliance
– OperaUonal (SLA) – Regulatory (audits) – Security
! Availability / performance – Faster MTTR
! Business insights ($$$)
Log use cases – “organizaUonal percepUon”
14
![Page 15: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/15.jpg)
! (wildly) varying formats – prinq, JSON, XML, Windows, X-‐delimited, ...
! Specialized knowledge
! Noise ! Cascading failures
Log challenges
15
[2008-05-07 09:50:08.450 'App' 3560 verbose] [VpxdHeartbeat] Invalid heartbeat from 10.17.218.46
![Page 16: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/16.jpg)
“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” -‐ Leslie Lamport
Complexity
16
Transport for London December 2013
Key to symbols Explanation of zones
1
3
45
6
2
789
Station in both zones
Station in both zones
Station in both zones
Station in Zone 9
Station in Zone 6
Station in Zone 5
Station in Zone 3Station in Zone 2
Station in Zone 1
Station in Zone 4
Station in Zone 8
Station in Zone 7
National Rail
Riverboat services
Airport
Tramlink
Interchange stations
Step-free access from street to platform
Step-free access from street to train
Emirates Air Line
Check before you travel
Key to lines
Metropolitan
Victoria
Circle
Central
Bakerloo
DLR
London Overground
Piccadilly
Waterloo & City
Jubilee
Hammersmith & City
Northern
DistrictDistrict open weekends, public holidays and some Olympia events
Emirates Air Line
Bank Waterloo & City line open between Bank and Waterloo 0621-0030 Mondays to Fridays and 0802-0030 Saturdays. Between Waterloo and Bank 0615-0030 Mondays to Fridays and 0800-0030 Saturdays. Closed Sundays and Public Holidays.---------------------------------------------------------------------------------Camden Town Sunday 1300-1730 open for interchange and exit only.---------------------------------------------------------------------------------Canary Wharf Step-free interchange between Underground, Canary Wharf DLR and Heron Quays DLR stations at street level.---------------------------------------------------------------------------------Cannon Street Open until 2100 Mondays to Fridays and 0730-1930 Saturdays. Closed Sundays.---------------------------------------------------------------------------------Embankment Bakerloo and Northern line trains will not stop at this station from early January 2014 until early November 2014. ---------------------------------------------------------------------------------Emirates Greenwich Peninsula and Emirates Royal DocksSpecial fares apply. Open 0700-2000 Mondays to Fridays, 0800-2000 Saturdays, 0900-2000 Sundays and 0800-2000 Public Holidays. Opening hours are extended by one hour in the evening after 1 April 2014 and may be extended on certain events days. Please check close to the time of travel. ---------------------------------------------------------------------------------Heron Quays Step-free interchange between Heron Quays and Canary Wharf Underground station at street level.---------------------------------------------------------------------------------Hounslow WestStep-free access for manual wheelchairs only.---------------------------------------------------------------------------------Kilburn No step-free access from late January 2014 until mid May 2014.---------------------------------------------------------------------------------Stanmore Step-free access via a steep ramp. ---------------------------------------------------------------------------------Turnham Green Served by Piccadilly line trains until 0650 Mondays to Saturdays, 0745 Sundays and after 2230 every evening. At other times use District line.---------------------------------------------------------------------------------Waterloo Waterloo & City line open between Bank and Waterloo 0621-0030 Mondays to Fridays and 0802-0030 Saturdays. Between Waterloo and Bank 0615-0030 Mondays to Fridays and 0800-0030 Saturdays. Closed Sundays and Public Holidays.No step-free access from late January 2014 until late July 2014.---------------------------------------------------------------------------------West India QuayNot served by DLR trains from Bank towards Lewisham before 2100 on Mondays to Fridays.---------------------------------------------------------------------------------
River Thames
A
B
C
D
E
F
1 2 3 4 5 6 7 8 9
1 2 3 4 5 76 8 9
A
B
C
D
E
F
2 2
22
2
5
8 8 6
2
4
4
65
41
3
2
43
3
36 3 1
1
3
3
59 7 7Special fares apply
5
5
4
4
4
AmershamChorleywood
Mill Hill East
Rickmansworth
Perivale
KentishTown West
CamdenRoad
Dalston Kingsland
Wanstead Park
Vauxhall
Hanger Lane
Edgware
Burnt Oak
Colindale
Hendon Central
Brent Cross
Golders Green
WestSilvertown
EmiratesRoyal Docks
EmiratesGreenwichPeninsula Pontoon Dock
LondonCity Airport
WoolwichArsenal
King George V
Hampstead
Belsize Park
Chalk Farm
Chalfont &Latimer
Chesham
New CrossGate
Moor Park
NorthwoodNorthwoodHills
Pinner
North Harrow
Custom House for ExCeL
Prince Regent
Royal Albert
Beckton Park
Cyprus
GallionsReach
Beckton
Watford
Croxley
Fulham Broadway
LambethNorth
HeathrowTerminal 4
Harrow-on-the-Hill
KensalRise
BethnalGreen
Westferry
SevenSisters
Blackwall
BrondesburyPark
HampsteadHeath
HarringayGreen Lanes
LeytonstoneHigh Road
LeytonMidland Road
HackneyCentral
NorthwickPark
PrestonRoad
RoyalVictoria
WembleyPark
Rayners Lane
Watford High Street
RuislipGardens
South Ruislip
Greenford
Northolt
South Harrow
Sudbury Hill
Sudbury Town
Alperton
Pimlico
Park Royal
North Ealing
Acton Central
South Acton
Ealing Broadway
Watford Junction
West Ruislip
Bushey
Carpenders Park
Hatch End
North Wembley
West Brompton
Ealing Common
South Kenton
Kenton
Wembley Central
Kensal Green
Queen’s Park
Gunnersbury
Kew Gardens
Richmond
Stockwell
Bow Church
Stonebridge Park
Harlesden
Camden Town
Willesden Junction
Headstone Lane
Parsons Green
Putney Bridge
East Putney
Southfields
Wimbledon Park
Wimbledon
Island Gardens
Greenwich
Deptford Bridge
South Quay
Crossharbour
Mudchute
Heron Quays
West India Quay
Elverson Road
Oakwood
Cockfosters
Southgate
Arnos Grove
Bounds Green
Theydon Bois
Epping
Debden
Loughton
Buckhurst Hill
WalthamstowQueen’s Road
Woodgrange Park
Leytonstone
Leyton
Wood Green
Turnpike Lane
Manor House
Stanmore
Canons Park
Queensbury
Kingsbury
High Barnet
Totteridge & Whetstone
Woodside Park
West Finchley
Finchley CentralWoodford
South Woodford
Snaresbrook
Hainault
Fairlop
Barkingside
Newbury Park
East Finchley
Highgate
Archway
Devons Road
Langdon Park
All Saints
Tufnell Park
Kentish Town
Neasden
Dollis Hill
Willesden Green
South Tottenham
Swiss Cottage
ImperialWharf
Brixton
Kilburn
West Hampstead
Blackhorse Road
Acton Town
CanningTown
Finchley Road
Highbury &Islington
Canary Wharf
Stratford
StratfordInternational
FinsburyPark
Elephant & Castle
Stepney Green
Barking
East Ham
Plaistow
Upton Park
Poplar
West Ham
Upper Holloway
PuddingMill Lane
Kennington
Borough
Elm ParkDagenham
East
DagenhamHeathway
Becontree
Upney
Heathrow Terminal 5
Finchley Road& Frognal
CrouchHill
Northfields
Boston Manor
South Ealing
Osterley
Hounslow Central
Hounslow East
Clapham North
Clapham High Street
Oval
Clapham Common
Clapham South
Balham
Tooting Bec
Tooting Broadway
Colliers Wood
South Wimbledon
Arsenal
Holloway Road
Caledonian Road
Morden
West Croydon
HounslowWest
Hatton Cross
HeathrowTerminals 1, 2, 3
ClaphamJunction
WestHarrow
Brondesbury CaledonianRoad &
Barnsbury
TottenhamHale
WalthamstowCentral
HackneyWick
Homerton
WestActon
Limehouse EastIndia
Crystal Palace
ChiswickPark
RodingValley
GrangeHill
Chigwell
Redbridge
GantsHill
Wanstead
Ickenham
TurnhamGreen
Uxbridge
Hillingdon Ruislip
GospelOak
Mile End
Bow Road
Bromley-by-Bow
Upminster
Upminster Bridge
Hornchurch
Norwood Junction
Sydenham
Forest Hill
Anerley
Penge West
Honor Oak Park
Brockley
Harrow &Wealdstone
Cutty Sark for Maritime Greenwich
Ruislip Manor
Eastcote
Wapping
New Cross
Queens RoadPeckham
Peckham Rye
Denmark Hill
Surrey Quays
Whitechapel
Lewisham
Kilburn Park
Regent’s Park
KilburnHigh Road
EdgwareRoad
SouthHampstead
GoodgeStreet
Shepherd’s BushMarket
Goldhawk Road
Hammersmith
Bayswater
Warren Street
Aldgate
Euston
Farringdon
BarbicanRussellSquare
Kensington(Olympia)
MorningtonCrescent
High StreetKensington
Old Street
St. John’s Wood
Green Park
BakerStreet
NottingHill Gate
Victoria
AldgateEast
Blackfriars
Mansion House
Temple
Cannon Street
OxfordCircus
BondStreet
TowerHill
Westminster
PiccadillyCircus
CharingCross
Holborn
Tower Gateway
Monument
Moorgate
Leicester Square
London Bridge
St. Paul’s
Hyde Park Corner
Knightsbridge
StamfordBrook
RavenscourtPark
WestKensington
NorthActon
HollandPark
Marylebone
Angel
Queensway MarbleArch
SouthKensington
SloaneSquare
WandsworthRoad
Covent Garden
LiverpoolStreet
GreatPortland
Street
Bank
EastActon
ChanceryLane
LancasterGate
Warwick AvenueMaida Vale
Fenchurch Street
Paddington
BaronsCourt
GloucesterRoad St. James’s
Park
Latimer RoadLadbroke Grove
Royal Oak
Westbourne Park
Bermondsey
Rotherhithe
ShoreditchHigh Street
Dalston Junction
Haggerston
Hoxton
Wood Lane
Shepherd’sBush
WhiteCity
King’s CrossSt. Pancras
EustonSquareEdgware
Road
Southwark
Embankment
Stratford High Street
Abbey Road
Star Lane
Waterloo
TottenhamCourt Road
Canonbury
Shadwell
Earl’sCourt
NorthGreenwich
CanadaWater
![Page 17: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/17.jpg)
! Logs: like “computer tweets” ! Twi[er 2013*
– Peak @ ~144k TPS – Avg ~6k tweets / second
! Log data – Example: 1 TB / day – Avg ~25k logs / second
“OMG java.lang.NullPointerExcepUon #fail”
17
* https://blog.twitter.com/2013/new-tweets-per-second-record-and-how
![Page 18: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/18.jpg)
Systems that learn from experience
18
![Page 19: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/19.jpg)
19
Unsupervised clustering ! Given: set of items ! Do: group similar items
![Page 20: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/20.jpg)
20
Unsupervised clustering ! Given: set of items ! Do: group similar items
![Page 21: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/21.jpg)
Too many logs! “data disorientaUon”
~60k results: 30 minutes, one component
![Page 22: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/22.jpg)
DisUll logs down to underlying structure
![Page 23: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/23.jpg)
LogReduce: results "compressed” ~1000x
![Page 24: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/24.jpg)
printf("Health status check: %s is %s”,
hostid, hoststatus)
Health status check: zim-5 is OK
Health status check: gir-3 is OK
Health status check: gir-2 is TIMED OUT
Health status check: dib-1 is OK
In the beginning, there was the prinq()
Log generaUon
![Page 25: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/25.jpg)
printf("Health status check: %s is %s”,
hostid, hoststatus)
Health status check: zim-5 is OK
Health status check: gir-3 is OK
Health status check: gir-2 is TIMED OUT
Health status check: dib-1 is OK
Health status check: *** is ***
Reverse engineering prinq()
Log generaUon
“magic”
![Page 26: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/26.jpg)
26
1. Define string distance funcKon (e.g., Левенште́йн)
2. Do distance-‐based clustering
���������
��������
Unsupervised clustering ! Given: log messages ! Do: group by “signature”
![Page 27: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/27.jpg)
Drill-‐down into the original raw logs
![Page 28: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/28.jpg)
28
ParKally supervised clustering ! Given: set of items + side info ! Do: group similar items
![Page 29: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/29.jpg)
29
ParKally supervised clustering ! Given: set of items + side info ! Do: group similar items
![Page 30: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/30.jpg)
Too many wildcards!
30
![Page 31: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/31.jpg)
“Hint” from human user
31
![Page 32: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/32.jpg)
Not enough wildcards!
32
![Page 33: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/33.jpg)
“Hint” from human user
33
![Page 34: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/34.jpg)
34
Learning to rank ! Given: set of items, historical data ! Do: rank by “relevance”
![Page 35: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/35.jpg)
Two pages is sUll too many!
35
![Page 36: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/36.jpg)
36
Learning to rank ! Given: signatures, user acUvity
! Do: rank by “relevance”
![Page 37: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/37.jpg)
! Site “acUng weird” ! InvesUgaUon with LogReduce
– error logs è issue with content push/publish workflow • root cause
– 50 minutes later: “object missing” errors serving content • user-‐visible outage
! Benefits – rapidly “skim” the logs – create an alert
True story: troubleshooUng @ digital media co.
37
![Page 38: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/38.jpg)
! Domains
– financial services – SaaS vendors
! Use cases – Availability / performance
• Mean Ume to invesUgaUon (MTTI) = “hours to minutes”
– Security • Quickly bubble up unusual logs
More true stories
38
![Page 39: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/39.jpg)
General lessons
39
![Page 40: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/40.jpg)
! Combat “data disorientaUon”
General lessons
40
![Page 41: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/41.jpg)
! Combat “data disorientaUon” ! Surface latent structure
General lessons
41
![Page 42: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/42.jpg)
! Combat “data disorientaUon” ! Surface latent structure ! Link to underlying raw data
General lessons
42
![Page 43: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/43.jpg)
! Combat “data disorientaUon” ! Surface latent structure ! Link to underlying raw data ! Empower user to improve results
General lessons
43
![Page 44: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/44.jpg)
unknown unknowns
44
![Page 45: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/45.jpg)
45
Outlier detecKon ! Given: data points ! Do: idenUfy outliers
![Page 46: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/46.jpg)
46
Outlier detecKon ! Given: data points ! Do: idenUfy outliers
![Page 47: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/47.jpg)
47
Health check OK
Request processed
Txn timeout, retry
Anomaly detecKon ! Given: log data ! Do: flag anomalies
![Page 48: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/48.jpg)
48
Health check OK
Request processed
Txn timeout, retry
Anomaly detecKon ! Given: log data ! Do: flag anomalies
![Page 49: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/49.jpg)
InvesUgate and annotate events
49
logs
signatures
RAW DATA
HUMAN
![Page 50: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/50.jpg)
InvesUgate and annotate events
50
logs
signatures
RAW DATA
HUMAN
![Page 51: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/51.jpg)
InvesUgate and annotate events
51
logs
signatures
event
RAW DATA
HUMAN
![Page 52: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/52.jpg)
InvesUgate and annotate events
52
logs
signatures
event
RAW DATA
HUMAN
Umeline / alerts
![Page 53: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/53.jpg)
53
Supervised classificaKon ! Given: labeled data points ! Do: predict future labels
![Page 54: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/54.jpg)
54
Supervised classificaKon ! Given: labeled data points ! Do: predict future labels
![Page 55: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/55.jpg)
55
Supervised classificaKon ! Given: log data, annotated events
! Do: classify new occurrences
event
Umeline / alerts
![Page 56: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/56.jpg)
True stories
56
! SSH problems – configuraUon errors – script user auth failures
! PotenUal security events – surge of failed logins
! Unhappy infrastructure – Oracle – VMware
![Page 57: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/57.jpg)
! Explain algorithm “decisions”
– why was this flagged as anomaly?
More general lessons
57
![Page 58: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/58.jpg)
! Explain algorithm “decisions”
– why was this flagged as anomaly?
! Link to underlying raw data – (AGAIN)
More general lessons
58
![Page 59: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/59.jpg)
! Explain algorithm “decisions”
– why was this flagged as anomaly?
! Link to underlying raw data – (AGAIN)
! Empower user to improve results – (AGAIN)
More general lessons
59
![Page 60: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/60.jpg)
Numerical Ume-‐series data
60
![Page 61: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/61.jpg)
61
Signal decomposiKon ! Given: Ume-‐series ! Do: extract model components
![Page 62: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/62.jpg)
62
Outlier detecKon ! Given: data points ! Do: idenUfy outliers
![Page 63: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/63.jpg)
63
Outlier detecKon ! Given: data points ! Do: idenUfy outliers
![Page 64: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/64.jpg)
Raw
Smoothed
Windowed model
![Page 65: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/65.jpg)
65
Raw
Smoothed
Smoothed vs +3σ
Windowed model
![Page 66: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/66.jpg)
True stories
66
! Financial services: bad data points ! Security: misbehavior ! OperaUons: alerUng
![Page 67: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/67.jpg)
Yet more general lessons
67
! Time-‐series data analysis well-‐studied
! Read the literature(s)!
![Page 68: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/68.jpg)
< OBLIGATORY PLUGS >
68
freesumo.com
![Page 69: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/69.jpg)
BONUS: scale-‐out, streaming architecture
69
logs
logs
logs
![Page 70: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/70.jpg)
BONUS: approximaUng with Count-‐Min Sketch
70
![Page 71: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/71.jpg)
BONUS: approximaUng with Count-‐Min Sketch
71
![Page 72: Machine learning for machine data - info.sumologic.cominfo.sumologic.com/rs/sumologic/images/... · Machine learning for machine data! 1 David!Andrzejewski!1!@davidandrzej! DataSciences,!Sumo!Logic!](https://reader033.vdocuments.us/reader033/viewer/2022042206/5ea94484f0542030302ac35b/html5/thumbnails/72.jpg)
Approximate counUng with Count-‐Min Sketch
72