deeplog: anomaly detection and diagnosis from system logs ...lifeifei/papers/dl_ccs.pdf · deeplog:...

106
DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du, Feifei Li, Guineng Zheng, Vivek Srikumar University of Utah

Upload: others

Post on 22-Aug-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning

Min Du, Feifei Li, Guineng Zheng, Vivek SrikumarUniversity of Utah

Page 2: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Background

2

Page 3: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Background

System Event Log

3

Page 4: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Background

System Event Log

Available practically on

every computer system!

4

Page 5: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Background

System Event Log

Automatic Analysis?

5

Available practically on

every computer system!

Page 6: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Background

6

Automatically detected anomaly

Page 7: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Background

System

Event

Log

7

Started service A on port 80

Executor updated: app-1 is now LOADING

……

Page 8: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Background

System

Event

Log

Structured DataMessage type

Log key

……

printf(“Started service

%s on port %d”, x, y);LOG

PARSING

8

Started service A on port 80

Executor updated: app-1 is now LOADING

……

Page 9: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Background

System

Event

Log

Structured DataMessage type

Log key

……

printf(“Started service

%s on port %d”, x, y);LOG

PARSING

Started service A on port 80

Executor updated: app-1 is now LOADING

……

Started service * on port * (log key ID: 1)

Executor updated: * is now LOADING (log key ID: 2)

……

9

Page 10: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Background

System

Event

Log

Structured Data Anomaly

DetectionMessage type

Log key

……

printf(“Started service

%s on port %d”, x, y);

LOG ANALYSIS

LOG

PARSING

10

Page 11: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Background

System

Event

Log

Structured Data Anomaly

DetectionMessage type

Log key

……

printf(“Started service

%s on port %d”, x, y);

LOG ANALYSIS

Message count vector:

Xu’SOSP09, Lou’ATC10, etc.

LOG

PARSING

11

Page 12: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Background

Structured Data Anomaly

DetectionMessage type

Log key

……

printf(“Started service

%s on port %d”, x, y);

LOG ANALYSIS

Message count vector:

Xu’SOSP09, Lou’ATC10, etc.

Problem: Offline batched processing

LOG

PARSING

System

Event

Log

12

Page 13: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Background

Structured Data Anomaly

DetectionMessage type

Log key

……

printf(“Started service

%s on port %d”, x, y);

LOG ANALYSIS

Message count vector:

Xu’SOSP09, Lou’ATC10, etc.

Problem: Offline batched processing

Build workflow model:

Lou’KDD10, Beschastnikh’ICSE14, Yu’ASPLOS16, etc.

LOG

PARSING

System

Event

Log

13

Page 14: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Background

Structured Data Anomaly

DetectionMessage type

Log key

……

printf(“Started service

%s on port %d”, x, y);

LOG ANALYSIS

Message count vector:

Xu’SOSP09, Lou’ATC10, etc.

Problem: Offline batched processing

Build workflow model:

Lou’KDD10, Beschastnikh’ICSE14, Yu’ASPLOS16, etc.

Problem: Only for simple execution path anomalies

LOG

PARSING

System

Event

Log

14

Page 15: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Background

Structured Data Anomaly

DetectionMessage type

Log key

……

printf(“Started service

%s on port %d”, x, y);

LOG ANALYSIS

Message count vector:

Xu’SOSP09, Lou’ATC10, etc.

Problem: Offline batched processing

Build workflow model:

Lou’KDD10, Beschastnikh’ICSE14, Yu’ASPLOS16, etc.

Problem: Only for simple execution path anomalies

LOG

PARSING

Common problem:

Only Log keys

(Message types)

are considered.

System

Event

Log

15

Page 16: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog

log message (log key underlined) log key parameter value vector

𝑡1 Deletion of file1 complete 𝑘1 [𝑡1 - 𝑡0, file1]

𝑡2 Took 0.61 seconds to deallocate network … 𝑘2 [𝑡2 - 𝑡1, 0.61]

𝑡3 VM Stopped (Lifecycle Event) 𝑘3 [𝑡3 - 𝑡2]

… … …

16

Page 17: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog

SPELLA streaming log

parser published in

ICDM’16

log message (log key underlined) log key parameter value vector

𝑡1 Deletion of file1 complete 𝑘1 [𝑡1 - 𝑡0, file1]

𝑡2 Took 0.61 seconds to deallocate network … 𝑘2 [𝑡2 - 𝑡1, 0.61]

𝑡3 VM Stopped (Lifecycle Event) 𝑘3 [𝑡3 - 𝑡2]

… … …

17

Page 18: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog

SPELLA streaming log

parser published in

ICDM’16

log keylog message parameters

log message (log key underlined) log key parameter value vector

𝑡1 Deletion of file1 complete 𝑘1 [𝑡1 - 𝑡0, file1]

𝑡2 Took 0.61 seconds to deallocate network … 𝑘2 [𝑡2 - 𝑡1, 0.61]

𝑡3 VM Stopped (Lifecycle Event) 𝑘3 [𝑡3 - 𝑡2]

… … …

18

Page 19: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog

SPELLA streaming log

parser published in

ICDM’16

Deletion of file1 complete.

log keylog message parameters

log message (log key underlined) log key parameter value vector

𝑡1 Deletion of file1 complete 𝑘1 [𝑡1 - 𝑡0, file1]

𝑡2 Took 0.61 seconds to deallocate network … 𝑘2 [𝑡2 - 𝑡1, 0.61]

𝑡3 VM Stopped (Lifecycle Event) 𝑘3 [𝑡3 - 𝑡2]

… … …

19

Page 20: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog

SPELLA streaming log

parser published in

ICDM’16

Deletion of file1 complete.

log keylog message parameters

log message (log key underlined) log key parameter value vector

𝑡1 Deletion of file1 complete 𝑘1 [𝑡1 - 𝑡0, file1]

𝑡2 Took 0.61 seconds to deallocate network … 𝑘2 [𝑡2 - 𝑡1, 0.61]

𝑡3 VM Stopped (Lifecycle Event) 𝑘3 [𝑡3 - 𝑡2]

… … …

20

Deletion of * complete. [file1]

Page 21: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog

SPELLA streaming log

parser published in

ICDM’16

Deletion of file1 complete.

log keylog message

Deletion of file2 complete.

parameters

log message (log key underlined) log key parameter value vector

𝑡1 Deletion of file1 complete 𝑘1 [𝑡1 - 𝑡0, file1]

𝑡2 Took 0.61 seconds to deallocate network … 𝑘2 [𝑡2 - 𝑡1, 0.61]

𝑡3 VM Stopped (Lifecycle Event) 𝑘3 [𝑡3 - 𝑡2]

… … …

21

Deletion of * complete. [file1]

Page 22: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog

SPELLA streaming log

parser published in

ICDM’16

Deletion of file1 complete. Deletion of * complete.

log keylog message

Deletion of file2 complete. Deletion of * complete.

parameters

[file1]

[file2]

log message (log key underlined) log key parameter value vector

𝑡1 Deletion of file1 complete 𝑘1 [𝑡1 - 𝑡0, file1]

𝑡2 Took 0.61 seconds to deallocate network … 𝑘2 [𝑡2 - 𝑡1, 0.61]

𝑡3 VM Stopped (Lifecycle Event) 𝑘3 [𝑡3 - 𝑡2]

… … …

22

Page 23: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog

log message (log key underlined) log key parameter value vector

𝑡1 Deletion of file1 complete 𝑘1 [𝑡1 - 𝑡0, file1]

𝑡2 Took 0.61 seconds to deallocate network … 𝑘2 [𝑡2 - 𝑡1, 0.61]

𝑡3 VM Stopped (Lifecycle Event) 𝑘3 [𝑡3 - 𝑡2]

… … …

23

Page 24: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog

log message (log key underlined) log key parameter value vector

𝑡1 Deletion of file1 complete 𝑘1 [𝑡1 - 𝑡0, file1]

𝑡2 Took 0.61 seconds to deallocate network … 𝑘2 [𝑡2 - 𝑡1, 0.61]

𝑡3 VM Stopped (Lifecycle Event) 𝑘3 [𝑡3 - 𝑡2]

… … …

24

Page 25: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog

log message (log key underlined) log key parameter value vector

𝑡1 Deletion of file1 complete 𝑘1 [𝑡1 - 𝑡0, file1]

𝑡2 Took 0.61 seconds to deallocate network … 𝑘2 [𝑡2 - 𝑡1, 0.61]

𝑡3 VM Stopped (Lifecycle Event) 𝑘3 [𝑡3 - 𝑡2]

… … …

25

Page 26: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog

Anomaly Detection

log message (log key underlined) log key parameter value vector

𝑡1 Deletion of file1 complete 𝑘1 [𝑡1 - 𝑡0, file1]

𝑡2 Took 0.61 seconds to deallocate network … 𝑘2 [𝑡2 - 𝑡1, 0.61]

𝑡3 VM Stopped (Lifecycle Event) 𝑘3 [𝑡3 - 𝑡2]

… … …

26

Page 27: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog

Anomaly Detection Diagnosis

log message (log key underlined) log key parameter value vector

𝑡1 Deletion of file1 complete 𝑘1 [𝑡1 - 𝑡0, file1]

𝑡2 Took 0.61 seconds to deallocate network … 𝑘2 [𝑡2 - 𝑡1, 0.61]

𝑡3 VM Stopped (Lifecycle Event) 𝑘3 [𝑡3 - 𝑡2]

… … …

27

Page 28: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog

DeepLog

Anomaly Detection Diagnosis

log message (log key underlined) log key parameter value vector

𝑡1 Deletion of file1 complete 𝑘1 [𝑡1 - 𝑡0, file1]

𝑡2 Took 0.61 seconds to deallocate network … 𝑘2 [𝑡2 - 𝑡1, 0.61]

𝑡3 VM Stopped (Lifecycle Event) 𝑘3 [𝑡3 - 𝑡2]

… … …

28

Page 29: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog Architecture

Training

Stage

Detection

Stage

MODELS

29

Page 30: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog Architecture

Detection

Stage

MODELS

30

Page 31: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog Architecture

31

Page 32: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog Architecture

32

Page 33: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog Architecture

33

Page 34: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog Architecture

34

Page 35: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog Architecture

35

Page 36: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog Architecture

Training

Stage

Detection

Stage

MODELS

36

Page 37: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog Architecture

Training

Stage

MODELS

37

Page 38: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog Architecture

38

Page 39: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog Architecture

39

Page 40: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog Architecture

40

Page 41: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog Architecture

41

Page 42: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog Architecture

42

Page 43: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog Architecture

43

Page 44: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog Architecture

44

Page 45: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog Architecture

45

Page 46: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

DeepLog Architecture

MODELS

46

Page 47: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Log Key Anomaly Detection model

47

Example log key sequence:

25 18 54 57 18 56 … 25 18 54 57 56 18 …

➢ a rigorous set of logic and control flows

➢ a (more structured) natural language

Page 48: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Log Key Anomaly Detection model

48

Example log key sequence:

25 18 54 57 18 56 … 25 18 54 57 56 18 …

➢ a rigorous set of logic and control flows

➢ a (more structured) natural language

natural language modeling

multi-class classifier: history sequence => next key to appear

Page 49: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Log Key Anomaly Detection model

49

Example log key sequence:

25 18 54 57 18 56 … 25 18 54 57 56 18 …

➢ a rigorous set of logic and control flows

➢ a (more structured) natural language

natural language modeling

multi-class classifier: history sequence => next key to appear

A log key is detected to be abnormal if it does not

follow the prediction.

Page 50: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Log Key Anomaly Detection model

Use long short-term memory (LSTM) architecture

50

Page 51: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Log Key Anomaly Detection model

Use long short-term memory (LSTM) architecture

51

Page 52: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Log Key Anomaly Detection model

Use long short-term memory (LSTM) architecture

Training:

log key sequence:

h=3 25 18 54 57 18 56 … 25 18 54 57 56 18 …

52

Page 53: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Log Key Anomaly Detection model

Use long short-term memory (LSTM) architecture

Training:

log key sequence:

h=3 25 18 54 57 18 56 … 25 18 54 57 56 18 …

53

Page 54: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Log Key Anomaly Detection model

Use long short-term memory (LSTM) architecture

Training:

log key sequence:

h=3 25 18 54 57 18 56 … 25 18 54 57 56 18 …

54

Page 55: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Log Key Anomaly Detection model

Use long short-term memory (LSTM) architecture

Training:

log key sequence:

h=3 25 18 54 57 18 56 … 25 18 54 57 56 18 …

55

Page 56: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Log Key Anomaly Detection model

Use long short-term memory (LSTM) architecture

56

Detection:

In detection stage, DeepLog checks if the actual next log key

is among its top g probable predictions.

Page 57: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Log Key Anomaly Detection model

57

Page 58: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Log Key Anomaly Detection model

58

Page 59: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Log Key Anomaly Detection model

59

Page 60: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Workflow Construction

Input: log key sequence

25 18 54 57 18 56 … 25 18 54 57 56 18 …

Output:

60

Page 61: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Workflow Construction

61

Method 1: Using Log Key Anomaly Detection model

--- LSTM prediction probabilities

Page 62: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Workflow Construction

62

Method 1: Using Log Key Anomaly Detection model

--- LSTM prediction probabilities

An example of concurrency detection:

Page 63: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Workflow Construction

63

Method 1: Using Log Key Anomaly Detection model

--- LSTM prediction probabilities

An example of concurrency detection:

Page 64: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Workflow Construction

64

Method 1: Using Log Key Anomaly Detection model

--- LSTM prediction probabilities

An example of concurrency detection:

Page 65: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Workflow Construction

65

Method 1: Using Log Key Anomaly Detection model

--- LSTM prediction probabilities

An example of concurrency detection:

Page 66: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Method 1: Using Log Key Anomaly Detection model

--- LSTM prediction probabilities

An example of concurrency detection:

Workflow Construction

66

Page 67: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Method 2: A density-based clustering approach

Workflow Construction

67

Page 68: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Co-occurrence matrix of log keys (𝒌𝒊, 𝒌𝒋) within distance 𝒅

Workflow Construction

68

Method 2: A density-based clustering approach

𝑓𝑑(𝑘𝑖 , 𝑘𝑗) : the frequency of (𝑘𝑖 , 𝑘𝑗) appearing together within distance d

𝑓(𝑘𝑖) : the frequency of 𝑘𝑖 in the input sequence

𝑝𝑑(i, 𝑗) : the probability of (𝑘𝑖 , 𝑘𝑗) appearing together within distance d

Page 69: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Example:

Log messages of a particular log key:

𝒕𝟐: 𝑻𝒐𝒐𝒌 𝟎. 𝟔𝟏 𝒔𝒆𝒄𝒐𝒏𝒅𝒔 𝒕𝒐 𝒅𝒆𝒂𝒍𝒍𝒐𝒄𝒂𝒕𝒆 𝒏𝒆𝒕𝒘𝒐𝒓𝒌 …𝒕′𝟐: 𝑻𝒐𝒐𝒌 𝟏. 𝟏 𝒔𝒆𝒄𝒐𝒏𝒅𝒔 𝒕𝒐 𝒅𝒆𝒂𝒍𝒍𝒐𝒄𝒂𝒕𝒆 𝒏𝒆𝒕𝒘𝒐𝒓𝒌 …

….

Parameter Value Anomaly Detection model

69

Page 70: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Example:

Log messages of a particular log key:

𝒕𝟐: 𝑻𝒐𝒐𝒌 𝟎. 𝟔𝟏 𝒔𝒆𝒄𝒐𝒏𝒅𝒔 𝒕𝒐 𝒅𝒆𝒂𝒍𝒍𝒐𝒄𝒂𝒕𝒆 𝒏𝒆𝒕𝒘𝒐𝒓𝒌 …𝒕′𝟐: 𝑻𝒐𝒐𝒌 𝟏. 𝟏 𝒔𝒆𝒄𝒐𝒏𝒅𝒔 𝒕𝒐 𝒅𝒆𝒂𝒍𝒍𝒐𝒄𝒂𝒕𝒆 𝒏𝒆𝒕𝒘𝒐𝒓𝒌 …

….

Parameter value vectors overtime:

[𝒕𝟐- 𝒕𝟏, 0.61], [𝒕′𝟐- 𝒕′𝟏, 1.1], ….

Parameter Value Anomaly Detection model

70

Page 71: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Example:

Log messages of a particular log key:

𝒕𝟐: 𝑻𝒐𝒐𝒌 𝟎. 𝟔𝟏 𝒔𝒆𝒄𝒐𝒏𝒅𝒔 𝒕𝒐 𝒅𝒆𝒂𝒍𝒍𝒐𝒄𝒂𝒕𝒆 𝒏𝒆𝒕𝒘𝒐𝒓𝒌 …𝒕′𝟐: 𝑻𝒐𝒐𝒌 𝟏. 𝟏 𝒔𝒆𝒄𝒐𝒏𝒅𝒔 𝒕𝒐 𝒅𝒆𝒂𝒍𝒍𝒐𝒄𝒂𝒕𝒆 𝒏𝒆𝒕𝒘𝒐𝒓𝒌 …

….

Parameter value vectors overtime:

[𝒕𝟐- 𝒕𝟏, 0.61], [𝒕′𝟐- 𝒕′𝟏, 1.1], ….

Multi-variate time series data anomaly detection problem!

Parameter Value Anomaly Detection model

71

Page 72: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Multi-variate time series data anomaly detection problem

✓ Leverage LSTM-based approach;

✓ A parameter value vector is given as input at each time step;

✓ An anomaly is detected if the mean-square-error (MSE)

between prediction and actual data is too big.

Parameter Value Anomaly Detection model

72

Page 73: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Parameter Value Anomaly Detection model

history

time

value

73

Multi-variate time series data anomaly detection problem

✓ Leverage LSTM-based approach;

✓ A parameter value vector is given as input at each time step;

✓ An anomaly is detected if the mean-square-error (MSE)

between prediction and actual data is too big.

Page 74: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Parameter Value Anomaly Detection model

prediction

74

time

value history

Multi-variate time series data anomaly detection problem

✓ Leverage LSTM-based approach;

✓ A parameter value vector is given as input at each time step;

✓ An anomaly is detected if the mean-square-error (MSE)

between prediction and actual data is too big.

Page 75: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Parameter Value Anomaly Detection model

actual

time

75

predictionvalue history

Multi-variate time series data anomaly detection problem

✓ Leverage LSTM-based approach;

✓ A parameter value vector is given as input at each time step;

✓ An anomaly is detected if the mean-square-error (MSE)

between prediction and actual data is too big.

Page 76: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Parameter Value Anomaly Detection model

actual

time

76

predictionvalue history

MSE > Threshold ?

Multi-variate time series data anomaly detection problem

✓ Leverage LSTM-based approach;

✓ A parameter value vector is given as input at each time step;

✓ An anomaly is detected if the mean-square-error (MSE)

between prediction and actual data is too big.

Page 77: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Parameter Value Anomaly Detection model

history

time

value

77

Multi-variate time series data anomaly detection problem

✓ Leverage LSTM-based approach;

✓ A parameter value vector is given as input at each time step;

✓ An anomaly is detected if the mean-square-error (MSE)

between prediction and actual data is too big.

Page 78: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Parameter Value Anomaly Detection model

actual

prediction

time

value

78

history

Multi-variate time series data anomaly detection problem

✓ Leverage LSTM-based approach;

✓ A parameter value vector is given as input at each time step;

✓ An anomaly is detected if the mean-square-error (MSE)

between prediction and actual data is too big.

Page 79: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Parameter Value Anomaly Detection model

actual

prediction

time

value

79

history

MSE > Threshold ?

Multi-variate time series data anomaly detection problem

✓ Leverage LSTM-based approach;

✓ A parameter value vector is given as input at each time step;

✓ An anomaly is detected if the mean-square-error (MSE)

between prediction and actual data is too big.

Page 80: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Parameter Value Anomaly Detection model

history

…time

value

80

Multi-variate time series data anomaly detection problem

✓ Leverage LSTM-based approach;

✓ A parameter value vector is given as input at each time step;

✓ An anomaly is detected if the mean-square-error (MSE)

between prediction and actual data is too big.

Page 81: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

LSTM model online update

Q: How to handle false positive?

81

Page 82: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

LSTM model online update

historyLog sequence:

Q: How to handle false positive?

82

Page 83: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

LSTM model online update

history

model

Log sequence:

Q: How to handle false positive?

83

Page 84: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

LSTM model online update

history

model

Log sequence:

prediction

Q: How to handle false positive?

84

Page 85: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

LSTM model online update

history current

model

Anomaly?

Log sequence:

prediction

Q: How to handle false positive?

85

Page 86: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

LSTM model online update

history current

model

Anomaly?

Log sequence:

prediction

Q: How to handle false positive?

Yes

86

Page 87: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

LSTM model online update

history current

model

Anomaly?

Log sequence:

prediction

Q: How to handle false positive?

Yes

False

positive?

87

Page 88: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

LSTM model online update

history current

model

Anomaly?Yes

update model using this case: “history -> current”

False

positive?

Yes

Log sequence:

prediction

Q: How to handle false positive?

88

Page 89: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Evaluation results on HDFS log data [1]. (over a million log entries with labeled anomalies)

[1] PCA (SOSP’09), IM (UsenixATC’10), N-gram (baseline language model)

Evaluation – log key anomaly detection

Up

is g

oo

d

89

Page 90: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Evaluation – parameter value anomaly detection

Evaluation results on OpenStack cloud log

with different confidence intervals (CIs)

MSE:

mean square error

90

Page 91: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Evaluation – parameter value anomaly detection

MSE:

mean square error

generated on CloudLab;

VM creation/deletion operations;

injected performance anomalies.

Evaluation results on OpenStack cloud log

with different confidence intervals (CIs)91

Page 92: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Evaluation – parameter value anomaly detection

Evaluation results on OpenStack cloud log

with different confidence intervals (CIs)

MSE:

mean square error

thre

sh

old

s

92

Page 93: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Evaluation – parameter value anomaly detection

Evaluation results on OpenStack cloud log

with different confidence intervals (CIs)

MSE:

mean square error

thre

sh

old

s

ANOMALY

93

Page 94: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Evaluation – parameter value anomaly detection

Evaluation results on OpenStack cloud log

with different confidence intervals (CIs)

MSE:

mean square error

thre

sh

old

s

ANOMALY

False

Positive

94

Page 95: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Evaluation – LSTM model online update

Evaluation on Blue Gene/L log,

with and without online model update.

Up

is g

oo

d

95

Page 96: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Evaluation – LSTM model online update

Evaluation on Blue Gene/L log,

with and without online model update.

Up

is g

oo

d

HPC log with labeled anomalies;

Available at

https://www.usenix.org/cfdr-data

96

Page 97: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Evaluation – case study: network security log

97

Dataset: IEEE VAST Challenge 2011 (Mini Challenge 2 – Computer Networking Operations)

The dataset contains firewall log, IDS log, etc.

Page 98: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Evaluation – case study: network security log

98

Dataset: IEEE VAST Challenge 2011 (Mini Challenge 2 – Computer Networking Operations)

The dataset contains firewall log, IDS log, etc.

Detection results.

Page 99: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Evaluation – case study: network security log

99

Dataset: IEEE VAST Challenge 2011 (Mini Challenge 2 – Computer Networking Operations)

The dataset contains firewall log, IDS log, etc.

Detection results.Could be fixed with prior knowledge

of “documented IP”

Page 100: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Evaluation – workflow construction

Constructed workflow of VM Creation.(previously generated OpenStack cloud log)

100

Page 101: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Evaluation – workflow construction

How does it help to

diagnose anomalies?

Constructed workflow of VM Creation.(previously generated OpenStack cloud log)

101

Page 102: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Evaluation – workflow construction

Parameter value

anomaly

How does it help to

diagnose anomalies?

Constructed workflow of VM Creation.(previously generated OpenStack cloud log)

102

Page 103: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Evaluation – workflow construction

Time difference

(performance) anomaly

Parameter value

anomaly

How does it help to

diagnose anomalies?

Constructed workflow of VM Creation.(previously generated OpenStack cloud log)

103

Page 104: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Evaluation – workflow construction

How does it help to

diagnose anomalies?

Constructed workflow of VM Creation.(previously generated OpenStack cloud log)

104

Identified anomaly:Instance took too long to build

because of the transition

from 52 -> 53

Page 105: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Evaluation – workflow construction

How does it help to

diagnose anomalies?

Identified anomaly:Instance took too long to build

because of the transition

from 52 -> 53

Injected anomaly: During VM creation,

network speed from controller

to compute node is throttled.Constructed workflow of VM Creation.

(previously generated OpenStack cloud log)

105

Page 106: DeepLog: Anomaly Detection and Diagnosis from System Logs ...lifeifei/papers/dl_ccs.pdf · DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning Min Du,

Summary

DeepLog

➢ A realtime system log anomaly detection framework.

➢ LSTM is used to model system execution paths and log parameter values.

➢ Workflow models are built to help anomaly diagnosis.

➢ It supports online model update.

Min Du

[email protected]

Feifei Li

[email protected]

106

Thank you!