research faculty summit 2018...machine learning in azure networking (a few sample problems) david a....

14
Systems | Fueling future disruptions Research Faculty Summit 2018

Upload: others

Post on 25-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Faculty Summit 2018...Machine Learning in Azure Networking (a few sample problems) David A. Maltz Distinguished Engineer Azure Physical Networking Team dmaltz@microsoft.com

Systems | Fueling future disruptions

ResearchFaculty Summit 2018

Page 2: Research Faculty Summit 2018...Machine Learning in Azure Networking (a few sample problems) David A. Maltz Distinguished Engineer Azure Physical Networking Team dmaltz@microsoft.com

Machine Learning in Azure Networking(a few sample problems)

David A. MaltzDistinguished EngineerAzure Physical Networking [email protected]

Page 3: Research Faculty Summit 2018...Machine Learning in Azure Networking (a few sample problems) David A. Maltz Distinguished Engineer Azure Physical Networking Team dmaltz@microsoft.com

Large Scale Creates Large Problems

• 100,000s of links in each datacenter

• 10,000s of links in each MAN

• 1,000s of links in the WAN

→ High availability is job number #1 for the network

At scale, the law of large numbers is not your friend

• Instead of “Occam’s Razor” – the simplest explanation is most likely

• “Murphy’s Law” applies – whatever can go wrong, will

Find the cause of perceived network problems is hard

Page 4: Research Faculty Summit 2018...Machine Learning in Azure Networking (a few sample problems) David A. Maltz Distinguished Engineer Azure Physical Networking Team dmaltz@microsoft.com

Large Scale Creates Large Problems

• 100,000s of links in each datacenter

• 10,000s of links in each MAN

• 1,000s of links in the WAN

→ High availability is job number #1 for the network

At scale, the law of large numbers is not your friend

• Instead of “Occam’s Razor” – the simplest explanation is most likely

• “Murphy’s Law” applies – whatever can go wrong, will

Find the cause of perceived network problems is hard

Network

Page 5: Research Faculty Summit 2018...Machine Learning in Azure Networking (a few sample problems) David A. Maltz Distinguished Engineer Azure Physical Networking Team dmaltz@microsoft.com

Machine Learning in Azure NetworkA Few Sample Problems

4

“I don't understand the underlying physics that causes this; however, I see outcomes, I know good vs bad, and I want to try and understand the outcome”

“I have a good physical model and understanding of causes”

Problem Machine

Learning

Rules

Based

System

Page 6: Research Faculty Summit 2018...Machine Learning in Azure Networking (a few sample problems) David A. Maltz Distinguished Engineer Azure Physical Networking Team dmaltz@microsoft.com

Topology: Which cables would you choose?

5

Page 7: Research Faculty Summit 2018...Machine Learning in Azure Networking (a few sample problems) David A. Maltz Distinguished Engineer Azure Physical Networking Team dmaltz@microsoft.com

Region and Path Availability

6

Page 8: Research Faculty Summit 2018...Machine Learning in Azure Networking (a few sample problems) David A. Maltz Distinguished Engineer Azure Physical Networking Team dmaltz@microsoft.com

Machine Learning in Azure NetworkLayer-1 Sample Problems

7

“I don't understand the underlying physics that causes this; however, I see outcomes, I know good vs bad, and I want to try and understand the outcome”

“I have a good physical model and understanding of causes”

Problem Machine

Learning

Rules

Based

System

Page 9: Research Faculty Summit 2018...Machine Learning in Azure Networking (a few sample problems) David A. Maltz Distinguished Engineer Azure Physical Networking Team dmaltz@microsoft.com

Wavelength & Performance Optimization

8

• Gaussian Noise model code implemented as gnpy on github

Page 10: Research Faculty Summit 2018...Machine Learning in Azure Networking (a few sample problems) David A. Maltz Distinguished Engineer Azure Physical Networking Team dmaltz@microsoft.com

Machine Learning in Azure NetworkLayer-1 Sample Problems

9

“I don't understand the underlying physics that causes this; however, I see outcomes, I know good vs bad, and I want to try and understand the outcome”

“I have a good physical model and understanding of causes”

Problem Machine

Learning

Rules

Based

System

Page 11: Research Faculty Summit 2018...Machine Learning in Azure Networking (a few sample problems) David A. Maltz Distinguished Engineer Azure Physical Networking Team dmaltz@microsoft.com

Network Availability“Gray” switch failures are the worst

• Switch stays in service

• Drops some fraction of packets

Find the needle in haystack• Pingmesh

• Targeted probe packets

• Error messages sent from switches

• Service-level health metrics

Combine them all to localize problem to most likely switch

Page 12: Research Faculty Summit 2018...Machine Learning in Azure Networking (a few sample problems) David A. Maltz Distinguished Engineer Azure Physical Networking Team dmaltz@microsoft.com

Machine Learning in the Azure Network TeamPractitioners Guide

11

Page 13: Research Faculty Summit 2018...Machine Learning in Azure Networking (a few sample problems) David A. Maltz Distinguished Engineer Azure Physical Networking Team dmaltz@microsoft.com

Thank you!

Page 14: Research Faculty Summit 2018...Machine Learning in Azure Networking (a few sample problems) David A. Maltz Distinguished Engineer Azure Physical Networking Team dmaltz@microsoft.com