temporal operators for spark streaming and its application for office365 service monitoring

15
TEMPORAL OPERATORS FOR SPARK STREAMING AND ITS APPLICATION FOR OFFICE365 SERVICE MONITORING Jin Li, Wesley Miao Microsoft

Upload: jen-aman

Post on 13-Apr-2017

543 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Temporal Operators For Spark Streaming And Its Application For Office365 Service Monitoring

TEMPORAL OPERATORS FOR SPARK STREAMING AND ITS APPLICATION FOR OFFICE365 SERVICE MONITORING

Jin Li, Wesley MiaoMicrosoft

Page 2: Temporal Operators For Spark Streaming And Its Application For Office365 Service Monitoring

Monitoring Target Service

Kafka

Spark Streaming OD

ata API

Collectors

aggregate monitor

DataCenterManagement

ServiceRe

cove

ry

Act

ions

Aggregated Events

Ale

rts

Reco

very

A

ctio

ns

Raw Signals

Topology D

ata

DashboardsCassandra

Temporal operators

Temporal operators

normalizeTemporal operators

Raw Signals

Topology D

ata

Normalized Events

Ale

rts

Paging Alerts

Oncall Engineer

Page 3: Temporal Operators For Spark Streaming And Its Application For Office365 Service Monitoring

Office365 Service Monitoring• Service Monitoring Signals

– Local active monitoring– Schema: <TopologyScopeValue, IsSuccess, …>– TopologyScopeValue: grouping of hosts

• e.g. server, rack, site, region

• Data may arrive out of order• Application logic defined on data time

Page 4: Temporal Operators For Spark Streaming And Its Application For Office365 Service Monitoring

Spark Streaming

aggregate monitor

Rec

ove

ry

Act

ions

Temporal operators

Temporal operators

normalize

Raw

Signals

Topology D

ata Ale

rts

reorderTemporal operators

val rawSignals : DStream[T]val topologyData: DStream[T1]

val normalized = rawSignals.join(topologyData, …)

Page 5: Temporal Operators For Spark Streaming And Its Application For Office365 Service Monitoring

Spark Streaming

aggregate monitor

Rec

ove

ry

Act

ions

Temporal operators

Temporal operators

normalize

Raw

Signals

Topology D

ata Ale

rts

reorderTemporal operators

val reordered = normalized.reorder(Policy.ReorderAdjustAndDrop(Seconds(60), Seconds(120)))

Page 6: Temporal Operators For Spark Streaming And Its Application For Office365 Service Monitoring

Spark Streaming

aggregate monitor

Rec

ove

ry

Act

ions

Temporal operators

Temporal operators

normalize

Raw

Signals

Topology D

ata Ale

rts

reorderTemporal operators

val aggregated = reordered.aggregate(TumblingWindow(5 minutes),groupBy(event.rack, event.site, event.region),Sum(event => event.isSuccess), Count(), Avg(event => event.latency)

)val availabilityStats = aggregated.map{…}

Page 7: Temporal Operators For Spark Streaming And Its Application For Office365 Service Monitoring

Spark Streaming

aggregate monitor

Rec

ove

ry

Act

ions

Temporal operators

Temporal operators

normalize

Raw

Signals

Topology D

ata Ale

rts

reorderTemporal operators

val monitoringStats = availabilityStats.join(availabilityStats, left => JoinKey(left.topologyScopeValue),right => JoinKey(right.topologyScopeValue),(left, right) => left.availability < 0.99 &&

right.availability >= 0.99TimeDiff(Seconds(-300), Seconds(0))

Page 8: Temporal Operators For Spark Streaming And Its Application For Office365 Service Monitoring

Temporal operators in depth

Page 9: Temporal Operators For Spark Streaming And Its Application For Office365 Service Monitoring

Temporal Operators – Reorder • Reorder(policy)

– inputStream.map { e => StreamEvent(e.timestamp, e) }.reorder(Policy.Reorder(Seconds(60)))

10:00:0010:00:0110:00:03

10:00:0209:59:01

10:00:0010:00:0110:00:03 10:00:02… …

10:01:01

10:00:0010:00:01

10:00:03 10:00:02… …10:01:01

HWM: 10:00:03HWM: 10:00:03HWM: 10:01:01

Page 10: Temporal Operators For Spark Streaming And Its Application For Office365 Service Monitoring

Temporal Operators – Event-time window aggregate

• Basic Availability Metric– SuccessEvents / TotalEvents for the every 5 minutes

• Tumbling window sum• Event-time window aggregate

• val availabilityMetrics = input.map { e => StreamEvent(e.timestamp, e) }.reorder(Policy.ReorderAdjustAndDrop(Seconds(60), Seconds(120))).aggregate(TumblingWindow(Seconds(300)), event => GroupByKey((event.topologyScopeValue, “topologyScopeValue”)), Sum(event => event.isSuccess, “sumOfSuccessEvent”),Count(event => event, “sumOfTotalEvent”)

).map (…)

Page 11: Temporal Operators For Spark Streaming And Its Application For Office365 Service Monitoring

Temporal Operators – Event-time window aggregate

• Implementation

(rack1) (SumState(150), CountState(170))window: 10:00:00 -10:05:00

(10:03:00, rack1, 1)(10:04:49, rack2, 0)

(rack2) (SumState(230), CountState(260))(10:05:01, rack2, 1)

… …

(10:05:00, rack1, 151, 171)

(10:05:00, rack2, 230, 261)

window: 10:00:05 -10:05:10

( … … )

(rack2) (SumState(1), CountState(1))(SumState(151), CountState(171))

(SumState(230), CountState(261))

Page 12: Temporal Operators For Spark Streaming And Its Application For Office365 Service Monitoring

Temporal Operators – Join• Alarm

– SuccessEvents/TotalEvents > threshold for the current 5 minutes, but not the previous 5 minutes

• Temporal self join

• Temporal JoinavailabilityMetrics.join(availabilityMetrics,left => JoinKey(x.topologyScopeValue, “leftTopologyScopeValue”),right => JoinKey(y.topologyScopeValue, “rightTopologyScopeValue”),(left, right) => left.availability < 0.99 && right.availability >= 0.99TimeDiff(Seconds(-300), Seconds(0))

)

Page 13: Temporal Operators For Spark Streaming And Its Application For Office365 Service Monitoring

Temporal Operators – Join• Temporal condition for Join

– TimeDiff(Seconds(-300), Seconds(0))

Left Right

-5 minutes

10:00:00

09:55:00

10:00:00

Page 14: Temporal Operators For Spark Streaming And Its Application For Office365 Service Monitoring

THANK YOU.Jin Li ([email protected] )Wesley Miao ([email protected] )

Page 15: Temporal Operators For Spark Streaming And Its Application For Office365 Service Monitoring

Questions?