icdm 2011 - efficient mining of closed sequential patterns on stream sliding window

56
Efficient Mining of Closed Sequential Patterns on Stream Sliding Window Chuancong Gao, Jianyong Wang, Qingyan Yang Database Laboratory Department of Computer Science and Technology Tsinghua University, Beijing, China C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 1 / 13

Upload: chuancong-gao

Post on 13-Apr-2017

46 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Efficient Mining of Closed Sequential Patterns onStream Sliding Window

Chuancong Gao, Jianyong Wang, Qingyan Yang

Database LaboratoryDepartment of Computer Science and Technology

Tsinghua University, Beijing, China

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 1 / 13

Page 2: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

What is Closed Sequence

Sequence:A sequence s is an ordered list of items, where each item can appear multipletimes, denoted by s = e1e2 . . . em. If sa is contained by sb, it is denoted as sa v sb.

A Sequence s’s Support (Frequency):

I dbs denotes a subset of input sequence database db where each sequences ′ ∈ db contains s.

I |dbs | is defined as the absolute support of s in db, denoted by supdbs (or supswhen clear).

I Given a specified minimum support threshold supmin, a sequence s is said tobe frequent iff sups ≥ supmin.

Closed Sequence:a non-empty subsequence sa is said to be closed iff 6 ∃sb that supsa = supsb andsa @ sb.

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 2 / 13

Page 3: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

What is Closed Sequence

Sequence:A sequence s is an ordered list of items, where each item can appear multipletimes, denoted by s = e1e2 . . . em. If sa is contained by sb, it is denoted as sa v sb.

A Sequence s’s Support (Frequency):

I dbs denotes a subset of input sequence database db where each sequences ′ ∈ db contains s.

I |dbs | is defined as the absolute support of s in db, denoted by supdbs (or supswhen clear).

I Given a specified minimum support threshold supmin, a sequence s is said tobe frequent iff sups ≥ supmin.

Closed Sequence:a non-empty subsequence sa is said to be closed iff 6 ∃sb that supsa = supsb andsa @ sb.

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 2 / 13

Page 4: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

What is Closed Sequence

Sequence:A sequence s is an ordered list of items, where each item can appear multipletimes, denoted by s = e1e2 . . . em. If sa is contained by sb, it is denoted as sa v sb.

A Sequence s’s Support (Frequency):

I dbs denotes a subset of input sequence database db where each sequences ′ ∈ db contains s.

I |dbs | is defined as the absolute support of s in db, denoted by supdbs (or supswhen clear).

I Given a specified minimum support threshold supmin, a sequence s is said tobe frequent iff sups ≥ supmin.

Closed Sequence:a non-empty subsequence sa is said to be closed iff 6 ∃sb that supsa = supsb andsa @ sb.

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 2 / 13

Page 5: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

What is Closed Sequence

Sequence:A sequence s is an ordered list of items, where each item can appear multipletimes, denoted by s = e1e2 . . . em. If sa is contained by sb, it is denoted as sa v sb.

A Sequence s’s Support (Frequency):

I dbs denotes a subset of input sequence database db where each sequences ′ ∈ db contains s.

I |dbs | is defined as the absolute support of s in db, denoted by supdbs (or supswhen clear).

I Given a specified minimum support threshold supmin, a sequence s is said tobe frequent iff sups ≥ supmin.

Closed Sequence:a non-empty subsequence sa is said to be closed iff 6 ∃sb that supsa = supsb andsa @ sb.

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 2 / 13

Page 6: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

What is Closed Sequence

Sequence:A sequence s is an ordered list of items, where each item can appear multipletimes, denoted by s = e1e2 . . . em. If sa is contained by sb, it is denoted as sa v sb.

A Sequence s’s Support (Frequency):

I dbs denotes a subset of input sequence database db where each sequences ′ ∈ db contains s.

I |dbs | is defined as the absolute support of s in db, denoted by supdbs (or supswhen clear).

I Given a specified minimum support threshold supmin, a sequence s is said tobe frequent iff sups ≥ supmin.

Closed Sequence:a non-empty subsequence sa is said to be closed iff 6 ∃sb that supsa = supsb andsa @ sb.

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 2 / 13

Page 7: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Why Mine Closed SequencesComparing to the set of all frequent sequences, the set of frequent closedsequences has the following advantages:

I Has much smaller pattern number. (Hundreds times smaller.)

I Its mining process is much more efficient, as unpromising search spaces canbe pruned. (Hundreds to thousands times faster on real datasets.)

I It is a concise representation of all frequent sequences.

Our GoalWe want to mine frequent closed sequences in current sliding window oversequence streams. Current sliding window is updated when a batch of sequencesarrives or leaves.

Tim

e L

ine

Time

6

5

4

3

2

1

ID

6

5

4

3

2

1

Sequence

CBBBA

BACB

CCABB

BCACB

CABC

ACABC Win

dow

#1

Win

dow

#2

Win

dow

#3

Example Dataset: Window Size = 4, Update Batch Size = 1

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 3 / 13

Page 8: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Why Mine Closed SequencesComparing to the set of all frequent sequences, the set of frequent closedsequences has the following advantages:

I Has much smaller pattern number. (Hundreds times smaller.)

I Its mining process is much more efficient, as unpromising search spaces canbe pruned. (Hundreds to thousands times faster on real datasets.)

I It is a concise representation of all frequent sequences.

Our GoalWe want to mine frequent closed sequences in current sliding window oversequence streams. Current sliding window is updated when a batch of sequencesarrives or leaves.

Tim

e L

ine

Time

6

5

4

3

2

1

ID

6

5

4

3

2

1

Sequence

CBBBA

BACB

CCABB

BCACB

CABC

ACABC Win

dow

#1

Win

dow

#2

Win

dow

#3

Example Dataset: Window Size = 4, Update Batch Size = 1

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 3 / 13

Page 9: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Why Mine Closed SequencesComparing to the set of all frequent sequences, the set of frequent closedsequences has the following advantages:

I Has much smaller pattern number. (Hundreds times smaller.)

I Its mining process is much more efficient, as unpromising search spaces canbe pruned. (Hundreds to thousands times faster on real datasets.)

I It is a concise representation of all frequent sequences.

Our GoalWe want to mine frequent closed sequences in current sliding window oversequence streams. Current sliding window is updated when a batch of sequencesarrives or leaves.

Tim

e L

ine

Time

6

5

4

3

2

1

ID

6

5

4

3

2

1

Sequence

CBBBA

BACB

CCABB

BCACB

CABC

ACABC Win

dow

#1

Win

dow

#2

Win

dow

#3

Example Dataset: Window Size = 4, Update Batch Size = 1

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 3 / 13

Page 10: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Why Mine Closed SequencesComparing to the set of all frequent sequences, the set of frequent closedsequences has the following advantages:

I Has much smaller pattern number. (Hundreds times smaller.)

I Its mining process is much more efficient, as unpromising search spaces canbe pruned. (Hundreds to thousands times faster on real datasets.)

I It is a concise representation of all frequent sequences.

Our GoalWe want to mine frequent closed sequences in current sliding window oversequence streams. Current sliding window is updated when a batch of sequencesarrives or leaves.

Tim

e L

ine

Time

6

5

4

3

2

1

ID

6

5

4

3

2

1

Sequence

CBBBA

BACB

CCABB

BCACB

CABC

ACABC Win

dow

#1

Win

dow

#2

Win

dow

#3

Example Dataset: Window Size = 4, Update Batch Size = 1

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 3 / 13

Page 11: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Why Mine Closed SequencesComparing to the set of all frequent sequences, the set of frequent closedsequences has the following advantages:

I Has much smaller pattern number. (Hundreds times smaller.)

I Its mining process is much more efficient, as unpromising search spaces canbe pruned. (Hundreds to thousands times faster on real datasets.)

I It is a concise representation of all frequent sequences.

Our GoalWe want to mine frequent closed sequences in current sliding window oversequence streams. Current sliding window is updated when a batch of sequencesarrives or leaves.

Tim

e L

ine

Time

6

5

4

3

2

1

ID

6

5

4

3

2

1

Sequence

CBBBA

BACB

CCABB

BCACB

CABC

ACABC Win

dow

#1

Win

dow

#2

Win

dow

#3

Example Dataset: Window Size = 4, Update Batch Size = 1

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 3 / 13

Page 12: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Why Mine Closed SequencesComparing to the set of all frequent sequences, the set of frequent closedsequences has the following advantages:

I Has much smaller pattern number. (Hundreds times smaller.)

I Its mining process is much more efficient, as unpromising search spaces canbe pruned. (Hundreds to thousands times faster on real datasets.)

I It is a concise representation of all frequent sequences.

Our GoalWe want to mine frequent closed sequences in current sliding window oversequence streams. Current sliding window is updated when a batch of sequencesarrives or leaves.

Tim

e L

ine

Time

6

5

4

3

2

1

ID

6

5

4

3

2

1

Sequence

CBBBA

BACB

CCABB

BCACB

CABC

ACABC Win

dow

#1

Win

dow

#2

Win

dow

#3

Example Dataset: Window Size = 4, Update Batch Size = 1C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 3 / 13

Page 13: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

ADD and REMOVE Operations

An enumeration tree is used to maintain all frequent closed sequences incurrent sliding window. Each tree node represents a frequent closedsubsequence. All nodes can be organized lexicographically as follows:

Ø

ACB : 2 BB : 2 BC : 3 CC : 4CAB : 4 CAC : 3

CCB : 2CABC : 2

Enumeration Tree of Window #1, supmin = 2

I When a batch of sequences arrives (call ADD) or leaves (call REMOVE):

I Update the enumerate tree by mining on the updated databaseincrementally.

I Get frequent closed sequences in current window by transversing theenumerate tree.

Later used Notations:db – Window before update db′ – Window after updatedb∗ – Batch of updating data (when arriving or leaving)

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 4 / 13

Page 14: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

ADD and REMOVE Operations

An enumeration tree is used to maintain all frequent closed sequences incurrent sliding window. Each tree node represents a frequent closedsubsequence. All nodes can be organized lexicographically as follows:

Ø

ACB : 2 BB : 2 BC : 3 CC : 4CAB : 4 CAC : 3

CCB : 2CABC : 2

Enumeration Tree of Window #1, supmin = 2

I When a batch of sequences arrives (call ADD) or leaves (call REMOVE):

I Update the enumerate tree by mining on the updated databaseincrementally.

I Get frequent closed sequences in current window by transversing theenumerate tree.

Later used Notations:db – Window before update db′ – Window after updatedb∗ – Batch of updating data (when arriving or leaving)

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 4 / 13

Page 15: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

ADD and REMOVE Operations

An enumeration tree is used to maintain all frequent closed sequences incurrent sliding window. Each tree node represents a frequent closedsubsequence. All nodes can be organized lexicographically as follows:

Ø

ACB : 2 BB : 2 BC : 3 CC : 4CAB : 4 CAC : 3

CCB : 2CABC : 2

Enumeration Tree of Window #1, supmin = 2

I When a batch of sequences arrives (call ADD) or leaves (call REMOVE):

I Update the enumerate tree by mining on the updated databaseincrementally.

I Get frequent closed sequences in current window by transversing theenumerate tree.

Later used Notations:db – Window before update db′ – Window after updatedb∗ – Batch of updating data (when arriving or leaving)

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 4 / 13

Page 16: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

ADD and REMOVE Operations

An enumeration tree is used to maintain all frequent closed sequences incurrent sliding window. Each tree node represents a frequent closedsubsequence. All nodes can be organized lexicographically as follows:

Ø

ACB : 2 BB : 2 BC : 3 CC : 4CAB : 4 CAC : 3

CCB : 2CABC : 2

Enumeration Tree of Window #1, supmin = 2

I When a batch of sequences arrives (call ADD) or leaves (call REMOVE):

I Update the enumerate tree by mining on the updated databaseincrementally.

I Get frequent closed sequences in current window by transversing theenumerate tree.

Later used Notations:db – Window before update db′ – Window after updatedb∗ – Batch of updating data (when arriving or leaving)

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 4 / 13

Page 17: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

ADD and REMOVE Operations

An enumeration tree is used to maintain all frequent closed sequences incurrent sliding window. Each tree node represents a frequent closedsubsequence. All nodes can be organized lexicographically as follows:

Ø

ACB : 2 BB : 2 BC : 3 CC : 4CAB : 4 CAC : 3

CCB : 2CABC : 2

Enumeration Tree of Window #1, supmin = 2

I When a batch of sequences arrives (call ADD) or leaves (call REMOVE):

I Update the enumerate tree by mining on the updated databaseincrementally.

I Get frequent closed sequences in current window by transversing theenumerate tree.

Later used Notations:db – Window before update db′ – Window after updatedb∗ – Batch of updating data (when arriving or leaving)

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 4 / 13

Page 18: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

ADD and REMOVE Operations

An enumeration tree is used to maintain all frequent closed sequences incurrent sliding window. Each tree node represents a frequent closedsubsequence. All nodes can be organized lexicographically as follows:

Ø

ACB : 2 BB : 2 BC : 3 CC : 4CAB : 4 CAC : 3

CCB : 2CABC : 2

Enumeration Tree of Window #1, supmin = 2

I When a batch of sequences arrives (call ADD) or leaves (call REMOVE):

I Update the enumerate tree by mining on the updated databaseincrementally.

I Get frequent closed sequences in current window by transversing theenumerate tree.

Later used Notations:db – Window before update db′ – Window after updatedb∗ – Batch of updating data (when arriving or leaving)

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 4 / 13

Page 19: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

ADD Operation

General Framework:

I Given an sequence p in db′ (with supdb′

p ≥ supmin and supdb∗

p ≥ 1):

I If p is already in a node n of the enumeration tree (was closed in db):Update p’s support in n (as can be proved p is also closed in db′).

I Else if p was frequent (non-closed) in db (if only supdbp ≥ supmin):I Check whether p can be pruned (incremental pruning checking).I If p cannot be pruned: Check whether p is closed (incremental closed

checking).I Else if p was not frequent in db:

I Check whether p can be pruned (basic pruning checking).I If p cannot be pruned: Check whether p is closed (basic closed

checking).

I If p cannot be pruned:

I If p is closed: Create a new node n for p in the enumeration tree.I For p’s each extended sequence p′ (by adding each local frequent item

after p), go to the first step.

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 5 / 13

Page 20: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

ADD Operation

General Framework:

I Given an sequence p in db′ (with supdb′

p ≥ supmin and supdb∗

p ≥ 1):

I If p is already in a node n of the enumeration tree (was closed in db):Update p’s support in n (as can be proved p is also closed in db′).

I Else if p was frequent (non-closed) in db (if only supdbp ≥ supmin):I Check whether p can be pruned (incremental pruning checking).I If p cannot be pruned: Check whether p is closed (incremental closed

checking).I Else if p was not frequent in db:

I Check whether p can be pruned (basic pruning checking).I If p cannot be pruned: Check whether p is closed (basic closed

checking).

I If p cannot be pruned:

I If p is closed: Create a new node n for p in the enumeration tree.I For p’s each extended sequence p′ (by adding each local frequent item

after p), go to the first step.

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 5 / 13

Page 21: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

ADD Operation

General Framework:

I Given an sequence p in db′ (with supdb′

p ≥ supmin and supdb∗

p ≥ 1):

I If p is already in a node n of the enumeration tree (was closed in db):Update p’s support in n (as can be proved p is also closed in db′).

I Else if p was frequent (non-closed) in db (if only supdbp ≥ supmin):I Check whether p can be pruned (incremental pruning checking).I If p cannot be pruned: Check whether p is closed (incremental closed

checking).

I Else if p was not frequent in db:I Check whether p can be pruned (basic pruning checking).I If p cannot be pruned: Check whether p is closed (basic closed

checking).

I If p cannot be pruned:

I If p is closed: Create a new node n for p in the enumeration tree.I For p’s each extended sequence p′ (by adding each local frequent item

after p), go to the first step.

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 5 / 13

Page 22: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

ADD Operation

General Framework:

I Given an sequence p in db′ (with supdb′

p ≥ supmin and supdb∗

p ≥ 1):

I If p is already in a node n of the enumeration tree (was closed in db):Update p’s support in n (as can be proved p is also closed in db′).

I Else if p was frequent (non-closed) in db (if only supdbp ≥ supmin):I Check whether p can be pruned (incremental pruning checking).I If p cannot be pruned: Check whether p is closed (incremental closed

checking).I Else if p was not frequent in db:

I Check whether p can be pruned (basic pruning checking).I If p cannot be pruned: Check whether p is closed (basic closed

checking).

I If p cannot be pruned:

I If p is closed: Create a new node n for p in the enumeration tree.I For p’s each extended sequence p′ (by adding each local frequent item

after p), go to the first step.

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 5 / 13

Page 23: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

ADD Operation

General Framework:

I Given an sequence p in db′ (with supdb′

p ≥ supmin and supdb∗

p ≥ 1):

I If p is already in a node n of the enumeration tree (was closed in db):Update p’s support in n (as can be proved p is also closed in db′).

I Else if p was frequent (non-closed) in db (if only supdbp ≥ supmin):I Check whether p can be pruned (incremental pruning checking).I If p cannot be pruned: Check whether p is closed (incremental closed

checking).I Else if p was not frequent in db:

I Check whether p can be pruned (basic pruning checking).I If p cannot be pruned: Check whether p is closed (basic closed

checking).

I If p cannot be pruned:

I If p is closed: Create a new node n for p in the enumeration tree.I For p’s each extended sequence p′ (by adding each local frequent item

after p), go to the first step.

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 5 / 13

Page 24: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Function insertable

Function insertable(pattern, db, endPos [, pred ]) returns a set of (pos, item) pairswhere 1 ≤ pos ≤ endPos ≤ |pattern|+ 1, with:

I cPattern = pattern[1, pos − 1] + 〈item〉+ pattern[pos, |pattern|].I supdbpattern = supdbcPattern.

I The optional predicate function pred(pattern, pos, item) must be true foreach found (pos, item) pair, if it is given.

I If endPos ≤ |pattern|, the matching positions of the last itempattern[endPos] in db must be same for both pattern and cPattern.

Function insertable gives a set of items with which we can get a new sequencewith the same support, by inserting one of the item in the given sequence.Predicate function pred gives some constraints.

Example:insertable(CB, db, |CB|) on db = {ACABC ,CABC ,BCACB,CCABB} returns(2,A).

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 6 / 13

Page 25: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Function insertable

Function insertable(pattern, db, endPos [, pred ]) returns a set of (pos, item) pairswhere 1 ≤ pos ≤ endPos ≤ |pattern|+ 1, with:

I cPattern = pattern[1, pos − 1] + 〈item〉+ pattern[pos, |pattern|].

I supdbpattern = supdbcPattern.

I The optional predicate function pred(pattern, pos, item) must be true foreach found (pos, item) pair, if it is given.

I If endPos ≤ |pattern|, the matching positions of the last itempattern[endPos] in db must be same for both pattern and cPattern.

Function insertable gives a set of items with which we can get a new sequencewith the same support, by inserting one of the item in the given sequence.Predicate function pred gives some constraints.

Example:insertable(CB, db, |CB|) on db = {ACABC ,CABC ,BCACB,CCABB} returns(2,A).

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 6 / 13

Page 26: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Function insertable

Function insertable(pattern, db, endPos [, pred ]) returns a set of (pos, item) pairswhere 1 ≤ pos ≤ endPos ≤ |pattern|+ 1, with:

I cPattern = pattern[1, pos − 1] + 〈item〉+ pattern[pos, |pattern|].I supdbpattern = supdbcPattern.

I The optional predicate function pred(pattern, pos, item) must be true foreach found (pos, item) pair, if it is given.

I If endPos ≤ |pattern|, the matching positions of the last itempattern[endPos] in db must be same for both pattern and cPattern.

Function insertable gives a set of items with which we can get a new sequencewith the same support, by inserting one of the item in the given sequence.Predicate function pred gives some constraints.

Example:insertable(CB, db, |CB|) on db = {ACABC ,CABC ,BCACB,CCABB} returns(2,A).

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 6 / 13

Page 27: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Function insertable

Function insertable(pattern, db, endPos [, pred ]) returns a set of (pos, item) pairswhere 1 ≤ pos ≤ endPos ≤ |pattern|+ 1, with:

I cPattern = pattern[1, pos − 1] + 〈item〉+ pattern[pos, |pattern|].I supdbpattern = supdbcPattern.

I The optional predicate function pred(pattern, pos, item) must be true foreach found (pos, item) pair, if it is given.

I If endPos ≤ |pattern|, the matching positions of the last itempattern[endPos] in db must be same for both pattern and cPattern.

Function insertable gives a set of items with which we can get a new sequencewith the same support, by inserting one of the item in the given sequence.Predicate function pred gives some constraints.

Example:insertable(CB, db, |CB|) on db = {ACABC ,CABC ,BCACB,CCABB} returns(2,A).

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 6 / 13

Page 28: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Function insertable

Function insertable(pattern, db, endPos [, pred ]) returns a set of (pos, item) pairswhere 1 ≤ pos ≤ endPos ≤ |pattern|+ 1, with:

I cPattern = pattern[1, pos − 1] + 〈item〉+ pattern[pos, |pattern|].I supdbpattern = supdbcPattern.

I The optional predicate function pred(pattern, pos, item) must be true foreach found (pos, item) pair, if it is given.

I If endPos ≤ |pattern|, the matching positions of the last itempattern[endPos] in db must be same for both pattern and cPattern.

Function insertable gives a set of items with which we can get a new sequencewith the same support, by inserting one of the item in the given sequence.Predicate function pred gives some constraints.

Example:insertable(CB, db, |CB|) on db = {ACABC ,CABC ,BCACB,CCABB} returns(2,A).

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 6 / 13

Page 29: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Function insertable

Function insertable(pattern, db, endPos [, pred ]) returns a set of (pos, item) pairswhere 1 ≤ pos ≤ endPos ≤ |pattern|+ 1, with:

I cPattern = pattern[1, pos − 1] + 〈item〉+ pattern[pos, |pattern|].I supdbpattern = supdbcPattern.

I The optional predicate function pred(pattern, pos, item) must be true foreach found (pos, item) pair, if it is given.

I If endPos ≤ |pattern|, the matching positions of the last itempattern[endPos] in db must be same for both pattern and cPattern.

Function insertable gives a set of items with which we can get a new sequencewith the same support, by inserting one of the item in the given sequence.

Predicate function pred gives some constraints.

Example:insertable(CB, db, |CB|) on db = {ACABC ,CABC ,BCACB,CCABB} returns(2,A).

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 6 / 13

Page 30: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Function insertable

Function insertable(pattern, db, endPos [, pred ]) returns a set of (pos, item) pairswhere 1 ≤ pos ≤ endPos ≤ |pattern|+ 1, with:

I cPattern = pattern[1, pos − 1] + 〈item〉+ pattern[pos, |pattern|].I supdbpattern = supdbcPattern.

I The optional predicate function pred(pattern, pos, item) must be true foreach found (pos, item) pair, if it is given.

I If endPos ≤ |pattern|, the matching positions of the last itempattern[endPos] in db must be same for both pattern and cPattern.

Function insertable gives a set of items with which we can get a new sequencewith the same support, by inserting one of the item in the given sequence.Predicate function pred gives some constraints.

Example:insertable(CB, db, |CB|) on db = {ACABC ,CABC ,BCACB,CCABB} returns(2,A).

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 6 / 13

Page 31: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Function insertable

Function insertable(pattern, db, endPos [, pred ]) returns a set of (pos, item) pairswhere 1 ≤ pos ≤ endPos ≤ |pattern|+ 1, with:

I cPattern = pattern[1, pos − 1] + 〈item〉+ pattern[pos, |pattern|].I supdbpattern = supdbcPattern.

I The optional predicate function pred(pattern, pos, item) must be true foreach found (pos, item) pair, if it is given.

I If endPos ≤ |pattern|, the matching positions of the last itempattern[endPos] in db must be same for both pattern and cPattern.

Function insertable gives a set of items with which we can get a new sequencewith the same support, by inserting one of the item in the given sequence.Predicate function pred gives some constraints.

Example:insertable(CB, db, |CB|) on db = {ACABC ,CABC ,BCACB,CCABB} returns(2,A).

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 6 / 13

Page 32: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Basic Pattern Closure & Pruning Checking

When a sequence p was not frequent in db, but is now frequent in db′, wehave:

Basic Pattern Closure Checking

Pattern p is non-closed if insertable(p,db′, |p|+ 1) returns at least oneresult.This checks whether we can get another sequence with the same support,by inserting an item in p.

Basic Pattern Pruning Checking

Pattern p and all its extended patterns can be safely pruned ifinsertable(p,db′, |p|) returns at least one result.This checks whether we can get another sequence with the same support,by inserting an item in p (before p’s last item which remains the samematching positions).

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 7 / 13

Page 33: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Basic Pattern Closure & Pruning Checking

When a sequence p was not frequent in db, but is now frequent in db′, wehave:

Basic Pattern Closure Checking

Pattern p is non-closed if insertable(p,db′, |p|+ 1) returns at least oneresult.

This checks whether we can get another sequence with the same support,by inserting an item in p.

Basic Pattern Pruning Checking

Pattern p and all its extended patterns can be safely pruned ifinsertable(p,db′, |p|) returns at least one result.This checks whether we can get another sequence with the same support,by inserting an item in p (before p’s last item which remains the samematching positions).

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 7 / 13

Page 34: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Basic Pattern Closure & Pruning Checking

When a sequence p was not frequent in db, but is now frequent in db′, wehave:

Basic Pattern Closure Checking

Pattern p is non-closed if insertable(p,db′, |p|+ 1) returns at least oneresult.This checks whether we can get another sequence with the same support,by inserting an item in p.

Basic Pattern Pruning Checking

Pattern p and all its extended patterns can be safely pruned ifinsertable(p,db′, |p|) returns at least one result.This checks whether we can get another sequence with the same support,by inserting an item in p (before p’s last item which remains the samematching positions).

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 7 / 13

Page 35: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Basic Pattern Closure & Pruning Checking

When a sequence p was not frequent in db, but is now frequent in db′, wehave:

Basic Pattern Closure Checking

Pattern p is non-closed if insertable(p,db′, |p|+ 1) returns at least oneresult.This checks whether we can get another sequence with the same support,by inserting an item in p.

Basic Pattern Pruning Checking

Pattern p and all its extended patterns can be safely pruned ifinsertable(p,db′, |p|) returns at least one result.

This checks whether we can get another sequence with the same support,by inserting an item in p (before p’s last item which remains the samematching positions).

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 7 / 13

Page 36: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Basic Pattern Closure & Pruning Checking

When a sequence p was not frequent in db, but is now frequent in db′, wehave:

Basic Pattern Closure Checking

Pattern p is non-closed if insertable(p,db′, |p|+ 1) returns at least oneresult.This checks whether we can get another sequence with the same support,by inserting an item in p.

Basic Pattern Pruning Checking

Pattern p and all its extended patterns can be safely pruned ifinsertable(p,db′, |p|) returns at least one result.This checks whether we can get another sequence with the same support,by inserting an item in p (before p’s last item which remains the samematching positions).

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 7 / 13

Page 37: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Incremental Pattern Closure Checking

When a sequence p was frequent in db, we check whether it is closed in db′

incrementally by calling insertable(p,db∗, |p|+ 1,prevClosed), with:

I Predicate prevClosed(p, pos, item) checks for each (pos, item) pair whetherp was closed in db with the same (pos, item) pair, by checking whetherp @ p′ v s with:

I p′ = p[1, pos − 1] + 〈item〉+ p[pos, |p|].I s ∈ {s|p @ s, s is closed, and supdbp = supdbs } (by using an index).

I p is non-closed if at least one result is found.

This checks whether p is not only non-closed in db∗ (with function insertable) byinserting a item in p, but also non-closed in db (with predicate prevClosed) withthe same item and position, and further non-closed on db′.

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 8 / 13

Page 38: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Incremental Pattern Closure Checking

When a sequence p was frequent in db, we check whether it is closed in db′

incrementally by calling insertable(p,db∗, |p|+ 1,prevClosed), with:

I Predicate prevClosed(p, pos, item) checks for each (pos, item) pair whetherp was closed in db with the same (pos, item) pair, by checking whetherp @ p′ v s with:

I p′ = p[1, pos − 1] + 〈item〉+ p[pos, |p|].I s ∈ {s|p @ s, s is closed, and supdbp = supdbs } (by using an index).

I p is non-closed if at least one result is found.

This checks whether p is not only non-closed in db∗ (with function insertable) byinserting a item in p, but also non-closed in db (with predicate prevClosed) withthe same item and position, and further non-closed on db′.

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 8 / 13

Page 39: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Incremental Pattern Closure Checking

When a sequence p was frequent in db, we check whether it is closed in db′

incrementally by calling insertable(p,db∗, |p|+ 1,prevClosed), with:

I Predicate prevClosed(p, pos, item) checks for each (pos, item) pair whetherp was closed in db with the same (pos, item) pair, by checking whetherp @ p′ v s with:

I p′ = p[1, pos − 1] + 〈item〉+ p[pos, |p|].I s ∈ {s|p @ s, s is closed, and supdbp = supdbs } (by using an index).

I p is non-closed if at least one result is found.

This checks whether p is not only non-closed in db∗ (with function insertable) byinserting a item in p, but also non-closed in db (with predicate prevClosed) withthe same item and position, and further non-closed on db′.

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 8 / 13

Page 40: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Incremental Pattern Closure Checking

When a sequence p was frequent in db, we check whether it is closed in db′

incrementally by calling insertable(p,db∗, |p|+ 1,prevClosed), with:

I Predicate prevClosed(p, pos, item) checks for each (pos, item) pair whetherp was closed in db with the same (pos, item) pair, by checking whetherp @ p′ v s with:

I p′ = p[1, pos − 1] + 〈item〉+ p[pos, |p|].I s ∈ {s|p @ s, s is closed, and supdbp = supdbs } (by using an index).

I p is non-closed if at least one result is found.

This checks whether p is not only non-closed in db∗ (with function insertable) byinserting a item in p, but also non-closed in db (with predicate prevClosed) withthe same item and position, and further non-closed on db′.

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 8 / 13

Page 41: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Incrmental Pattern Pruning Checking

When a sequence p was frequent in db, we check whether it can be pruned in db′

incrementally by calling insertable(p,db∗, |p|,prevPruned):

I We use a global hash structure to store all the frequent sequences that couldnot be pruned in db, called nonPruned .

I Predicate prevPruned(p, pos, item) checks for each (pos, item) pair whetherp was pruned in db with the same (pos, item) pair.

I This is done by running insertable(p, tdb, |p|) with tdb = {s|p @ s ands ∈ nonPrunable}, and checking whether the verifying (pos, item) pairis one of the returned results.

I p can be pruned if at least one result is found.

This checks whethe p can not only be pruned in db∗ (by function insertable) by

inserting an item in p (before p’s last item with matching positions unchanged),

but also in db (by predicate prevPruned) with the same item and position, and

further can be pruned on db′, .

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 9 / 13

Page 42: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Incrmental Pattern Pruning Checking

When a sequence p was frequent in db, we check whether it can be pruned in db′

incrementally by calling insertable(p,db∗, |p|,prevPruned):

I We use a global hash structure to store all the frequent sequences that couldnot be pruned in db, called nonPruned .

I Predicate prevPruned(p, pos, item) checks for each (pos, item) pair whetherp was pruned in db with the same (pos, item) pair.

I This is done by running insertable(p, tdb, |p|) with tdb = {s|p @ s ands ∈ nonPrunable}, and checking whether the verifying (pos, item) pairis one of the returned results.

I p can be pruned if at least one result is found.

This checks whethe p can not only be pruned in db∗ (by function insertable) by

inserting an item in p (before p’s last item with matching positions unchanged),

but also in db (by predicate prevPruned) with the same item and position, and

further can be pruned on db′, .

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 9 / 13

Page 43: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Incrmental Pattern Pruning Checking

When a sequence p was frequent in db, we check whether it can be pruned in db′

incrementally by calling insertable(p,db∗, |p|,prevPruned):

I We use a global hash structure to store all the frequent sequences that couldnot be pruned in db, called nonPruned .

I Predicate prevPruned(p, pos, item) checks for each (pos, item) pair whetherp was pruned in db with the same (pos, item) pair.

I This is done by running insertable(p, tdb, |p|) with tdb = {s|p @ s ands ∈ nonPrunable}, and checking whether the verifying (pos, item) pairis one of the returned results.

I p can be pruned if at least one result is found.

This checks whethe p can not only be pruned in db∗ (by function insertable) by

inserting an item in p (before p’s last item with matching positions unchanged),

but also in db (by predicate prevPruned) with the same item and position, and

further can be pruned on db′, .

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 9 / 13

Page 44: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Incrmental Pattern Pruning Checking

When a sequence p was frequent in db, we check whether it can be pruned in db′

incrementally by calling insertable(p,db∗, |p|,prevPruned):

I We use a global hash structure to store all the frequent sequences that couldnot be pruned in db, called nonPruned .

I Predicate prevPruned(p, pos, item) checks for each (pos, item) pair whetherp was pruned in db with the same (pos, item) pair.

I This is done by running insertable(p, tdb, |p|) with tdb = {s|p @ s ands ∈ nonPrunable}, and checking whether the verifying (pos, item) pairis one of the returned results.

I p can be pruned if at least one result is found.

This checks whethe p can not only be pruned in db∗ (by function insertable) by

inserting an item in p (before p’s last item with matching positions unchanged),

but also in db (by predicate prevPruned) with the same item and position, and

further can be pruned on db′, .

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 9 / 13

Page 45: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Incrmental Pattern Pruning Checking

When a sequence p was frequent in db, we check whether it can be pruned in db′

incrementally by calling insertable(p,db∗, |p|,prevPruned):

I We use a global hash structure to store all the frequent sequences that couldnot be pruned in db, called nonPruned .

I Predicate prevPruned(p, pos, item) checks for each (pos, item) pair whetherp was pruned in db with the same (pos, item) pair.

I This is done by running insertable(p, tdb, |p|) with tdb = {s|p @ s ands ∈ nonPrunable}, and checking whether the verifying (pos, item) pairis one of the returned results.

I p can be pruned if at least one result is found.

This checks whethe p can not only be pruned in db∗ (by function insertable) by

inserting an item in p (before p’s last item with matching positions unchanged),

but also in db (by predicate prevPruned) with the same item and position, and

further can be pruned on db′, .

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 9 / 13

Page 46: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Incrmental Pattern Pruning Checking

When a sequence p was frequent in db, we check whether it can be pruned in db′

incrementally by calling insertable(p,db∗, |p|,prevPruned):

I We use a global hash structure to store all the frequent sequences that couldnot be pruned in db, called nonPruned .

I Predicate prevPruned(p, pos, item) checks for each (pos, item) pair whetherp was pruned in db with the same (pos, item) pair.

I This is done by running insertable(p, tdb, |p|) with tdb = {s|p @ s ands ∈ nonPrunable}, and checking whether the verifying (pos, item) pairis one of the returned results.

I p can be pruned if at least one result is found.

This checks whethe p can not only be pruned in db∗ (by function insertable) by

inserting an item in p (before p’s last item with matching positions unchanged),

but also in db (by predicate prevPruned) with the same item and position, and

further can be pruned on db′, .

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 9 / 13

Page 47: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

REMOVE Operation

We can prove that during the REMOVE operation, a non-closed sequence wouldnever become closed.

Hence we only need to care about those closed sequences on the enumerationtree, since all other sequences would never become closed.

REMOVE operation removes subtrees whose supports are smaller than supmin

after update.Nodes whose patterns are no longer closed are removed, by checking whether ∃p′in the enumerate tree with p @ p′ and supdb

p = sup′db′

p (with the index usedbefore in incremental closed checking).

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 10 / 13

Page 48: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

REMOVE Operation

We can prove that during the REMOVE operation, a non-closed sequence wouldnever become closed.Hence we only need to care about those closed sequences on the enumerationtree, since all other sequences would never become closed.

REMOVE operation removes subtrees whose supports are smaller than supmin

after update.Nodes whose patterns are no longer closed are removed, by checking whether ∃p′in the enumerate tree with p @ p′ and supdb

p = sup′db′

p (with the index usedbefore in incremental closed checking).

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 10 / 13

Page 49: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

REMOVE Operation

We can prove that during the REMOVE operation, a non-closed sequence wouldnever become closed.Hence we only need to care about those closed sequences on the enumerationtree, since all other sequences would never become closed.

REMOVE operation removes subtrees whose supports are smaller than supmin

after update.

Nodes whose patterns are no longer closed are removed, by checking whether ∃p′in the enumerate tree with p @ p′ and supdb

p = sup′db′

p (with the index usedbefore in incremental closed checking).

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 10 / 13

Page 50: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

REMOVE Operation

We can prove that during the REMOVE operation, a non-closed sequence wouldnever become closed.Hence we only need to care about those closed sequences on the enumerationtree, since all other sequences would never become closed.

REMOVE operation removes subtrees whose supports are smaller than supmin

after update.Nodes whose patterns are no longer closed are removed, by checking whether ∃p′in the enumerate tree with p @ p′ and supdb

p = sup′db′

p (with the index usedbefore in incremental closed checking).

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 10 / 13

Page 51: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Experiments

The performance of our algorithm StreamCloSeq is compared with anotheralgorithm CISpan (D. Yuan, etc.. SDM’2008).

Table: Dataset characteristics

Dataset # Item # Seq. Max. Len. Avg. Len.Gazelle 1,423 29,369 651 2.98TCAS 105 1,578 96 60.3

Kosarak 41,270 990,002 2,498 8.1MSNBC 17 989,818 14,795 4.74

I For smaller datasets Gazelle and TCAS, evaluation for incremental mining isconducted, with only ADD operation executed, denoted by using a windowsize of ∞.

I For larger datasets Kosarak and MSNBC, evaluation for sliding windowmining is conducted, with running time of the first 100 full-sized windows(window with maximal window size) collected.

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 11 / 13

Page 52: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Experiments

The performance of our algorithm StreamCloSeq is compared with anotheralgorithm CISpan (D. Yuan, etc.. SDM’2008).

Table: Dataset characteristics

Dataset # Item # Seq. Max. Len. Avg. Len.Gazelle 1,423 29,369 651 2.98TCAS 105 1,578 96 60.3

Kosarak 41,270 990,002 2,498 8.1MSNBC 17 989,818 14,795 4.74

I For smaller datasets Gazelle and TCAS, evaluation for incremental mining isconducted, with only ADD operation executed, denoted by using a windowsize of ∞.

I For larger datasets Kosarak and MSNBC, evaluation for sliding windowmining is conducted, with running time of the first 100 full-sized windows(window with maximal window size) collected.

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 11 / 13

Page 53: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Experiments

The performance of our algorithm StreamCloSeq is compared with anotheralgorithm CISpan (D. Yuan, etc.. SDM’2008).

Table: Dataset characteristics

Dataset # Item # Seq. Max. Len. Avg. Len.Gazelle 1,423 29,369 651 2.98TCAS 105 1,578 96 60.3

Kosarak 41,270 990,002 2,498 8.1MSNBC 17 989,818 14,795 4.74

I For smaller datasets Gazelle and TCAS, evaluation for incremental mining isconducted, with only ADD operation executed, denoted by using a windowsize of ∞.

I For larger datasets Kosarak and MSNBC, evaluation for sliding windowmining is conducted, with running time of the first 100 full-sized windows(window with maximal window size) collected.

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 11 / 13

Page 54: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Experiments

The performance of our algorithm StreamCloSeq is compared with anotheralgorithm CISpan (D. Yuan, etc.. SDM’2008).

Table: Dataset characteristics

Dataset # Item # Seq. Max. Len. Avg. Len.Gazelle 1,423 29,369 651 2.98TCAS 105 1,578 96 60.3

Kosarak 41,270 990,002 2,498 8.1MSNBC 17 989,818 14,795 4.74

I For smaller datasets Gazelle and TCAS, evaluation for incremental mining isconducted, with only ADD operation executed, denoted by using a windowsize of ∞.

I For larger datasets Kosarak and MSNBC, evaluation for sliding windowmining is conducted, with running time of the first 100 full-sized windows(window with maximal window size) collected.

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 11 / 13

Page 55: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

Experiments – RuntimeStreamCloSeq CISpan

10

100

1000

10000

100000

4 5 6 7 8 9

Run

time

(in s

econ

ds)

Minimum Support Threshold

GazelleWindow Size = ∞Batch Size = 100

10

100

1000

100 200 300 400 500R

untim

e (in

sec

onds

)Minimum Support Threshold

TCASWindow Size = ∞

Batch Size = 10

0

100

200

300

400

500

600

100 150 200 250 300

Run

time

(in s

econ

ds)

Minimum Support Threshold

KosarakWindow Size = 50K

Batch Size = 1K

10

100

1000

10000

4 5 6 7 8 9 10

Run

time

(in s

econ

ds)

Minimum Support Threshold

MSNBCWindow Size = 10K

Batch Size = 100Results with Varying Minimum Support Threshold

10

100

1000

100 200

400 800

1600

Run

time

(in s

econ

ds)

Update Batch Size

GazelleWindow Size = ∞

Minimum Support = 5

100

200

300

400

500

600

700

125 250

500 1000

2000

Run

time

(in s

econ

ds)

Update Batch Size

KosarakWindow Size = 50K

Minimum Support = 100

0 200 400 600 800

1000 1200 1400 1600 1800

25 50 75 100 125R

untim

e (in

sec

onds

)Sliding Window Size (in 1,000s)

KosarakBatch Size = 1K

Minimum Support = 250

0 200 400 600 800

1000 1200 1400 1600

10 15 20 25 30 35

Run

time

(in s

econ

ds)

Sliding Window Size (in 1,000s)

MSNBCBatch Size = 100

Minimum Support = 10

Results with Varying Update Batch Size Results with Varying Sliding Window Size

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 12 / 13

Page 56: ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding Window

The End

Thank you!

Questions or Comments?

C. Gao, etc. (Tsinghua Univ.) Efficient Mining of Closed Sequential Patterns on Stream Sliding Window 13 / 13