an efficient algorithm for mining time interval-based patterns in large databases yi-cheng chen,...
TRANSCRIPT
An Efficient Algorithm for Mining Time Interval-
based Patterns in Large Databases
Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department of Computer Science
National Chiao Tung University Hsinchu, Taiwan 300
{ejen.cs95g, perrys0620.cs96g}@nctu.edu.tw [email protected] [email protected]
CIKM, 2010
OUTLINE1. INTRODUCTION 2.PROBLEM DEFINITION 3.INCISION STRATEGY 4.COINCIDENCE REPRESENTATION5.CTMiner ALGORITHM 6.EXPERIMENTAL RESULTS 7.CONCLUSION AND FUTURE
WORK
1. INTRODUCTION All related researches in this
domain are based on Allen’s temporal logics.
Which there are 13 temporal relations between any two event intervals .
1. INTRODUCTION
Compare with previous works :Kam et al. - hierarchical representation.Hoppner - scan database by sliding
window.Papapetrou - Hybrid-DFS algorithm.Wu et al. - TPrefixSpan.Patel et al. - Augmented
Representation(By additional counting information ), and IEMiner.
1. INTRODUCTION
Propose :Incision strategyCoincidence representationCTMiner (Coincidence Temporal
Miner)
2.PROBLEM DEFINITION
Event interval and event sequenceE = {e1, e2,…, ek} be the set of event
symbols.(ei, si, fi), ei ∈ E,
si , fi ,are time points, si < fi
Event start : ei.ts
Event finish : ei.tf
{(e1, s1, f1), (e2, s2, f2), …, (en, sn, fn)} where si ≤ si+1 and si< fi
2.PROBLEM DEFINITION
Temporal databaseDatabase D = {r1, r2, …, rm}, each
record ri, where 1≤ i≤ mA record ri consists of a sequence-id and
an event interval(start time and finish time).
Records in the database D with the same client-id are grouped together.
Database D can be viewed as a collection of event sequences.
2.PROBLEM DEFINITION
Time set and time sequenceAn event sequence q = {(e1, s1, f1), (e2, s2,
f2), …, (en, sn, fn)}The set T ={s1, f1, s2, f2, …, si, fi,…, sn, fn}
is called a time set corresponding to sequence q.
Order all the elements in T and eliminate redundant element, we got sequence Ts.sequence Ts = {t1, t2, t3, …, tk}where ti ∈ T , ti < ti+1.
2.PROBLEM DEFINITION Event slice
2.PROBLEM DEFINITION Event slice
4 event intervals in sequence 2 (en, sn, fn)(B,1,5),(D,8,4),(E,10,13),(F,10,13)
Corresponding time set T={1,5,8,14,10,13,10,13}
{s1, f1, s2, f2, s3, f3, s4, f4 }Time sequence Ts ={1,5,8,10,13,14} {t1, t2, t3, …, tk}
2.PROBLEM DEFINITION
Event sliceLet set L = { +, -, *, Φ},
a set of event sequences Q = {q1, q2, …, qi,…}, qi = {(e1, s1, f1), …, (ej, sj, fj) , … (en, sn, fn)}
2.PROBLEM DEFINITION Event slice
start slice D + = (D, 8, 10)intermediate slice D* = (D, 10, 13)finish slice D - = (D, 13, 14)
The event interval B has only one intact slice B = (B, 1, 5)
3.INCISION STRATEGY
3.INCISION STRATEGY Incision example
3.INCISION STRATEGY Incision example
The incision strategy can totally avoid the generation of intermediate slices. By trimming the intermediate slices, we can still express the relationship between any two intervals correctly.
4.COINCIDENCE REPRESENTATION
Group simultaneously occurring slices together to form the coincidences.
Concatenation with all coincidences can describe an event sequence effectively.
Simplify the processing of complex pairwise relationships between all intervals efficiently.
4.COINCIDENCE REPRESENTATION
4.COINCIDENCE REPRESENTATION
Good scalabilityNonambiguity Simple is good Compact space usage
5.CTMiner ALGORITHM
5.CTMiner ALGORITHM
min_sup = 2
5.CTMiner ALGORITHM
5.CTMiner ALGORITHM
6.EXPERIMENTAL RESULTS
Runtime performance on synthetic data sets
6.EXPERIMENTAL RESULTS
Real world dataset analysis
7.CONCLUSION AND FUTURE WORK
Coincidence representation is nonambiguous and has several advantages over existing representations .
7.CONCLUSION AND FUTURE WORK
Further : mining closed and maximal temporal patterns, incremental temporal patterns mining, and the research of method toward data stream.