behavior grouping based on trajectories mininghuanliu/sbp09/presentations...andrew2 26/ 45 0.578...
TRANSCRIPT
1
Behavior Grouping based on Trajectories Mining
Shoji Hirano Shusaku Tsumoto
Department of Medical InformaticsShimane University, School of Medicine, Japan
2
Outline Introduction
Background, Objective, Approach Method
Multiscale comparison and grouping of trajectories Experimental Results
Australia Sign Language data Hospital Management
Conclusions
Temporal Data Mining
• One Dimensional Time Series:• Chronological Behavior of One Variable
• Two Dimensional Time Series• Trajectory: Behavior of Two Variables
• Grouping of Temporal Sequences•Capture the dynamic behavior of Temporal Variables
•2D: Detection of Co-variant variables•Disease Grouping, …..
Discoveries from Hepatitis Data
ALB
PLT#602 (C5;F4)
PLT#170 (C5;F4)
ALB
#558 (C15;F1)
ALB ALB
PLT
#636(C15;F3)
PLT
Left: ALB, PLT covariant Right: ALB, PLT non-covariant
Two Groups of Disease Progression of Liver Fibrosis
Group1: ALB, PLT: decreasingGroup2: PLT: decreasing, ALT: stable
5
Segmentation and Generation of Multiscale Trajectories
Segment Hierarchy Trace and Matching
Calculation of Dissimilarities
Clustering of Trajectories
Trajectory Mining Process
6t=0
t=0
Multiscale Structural Comparison Represent trajectories using multiscale description Search the best correspondences of partial trajectory
throughout all scales
Trajectory A
Trajectory B
Attr.1
Attr.2Attr.1
Attr.2
Scale 0
Scale 0
Scale 2Scale 1
Scale 1 Scale 2
Segment
(cf .Ueda et al. (1990)
7
Multiscale Description Represent convex/concave structure of trajectories on
various observation scales Trajectory representation
Trajectory at scale σ
( ))(),...,(),()( 21 textextextc I=
),()(),( σσ tgtextEX ii ⊗=
σ=large
)0,(tC
),( σtC
Iitexi ∈),( : time series of test i
∑∞
−∞=−=
n in texIe )()(σσ
( )),(),...,,(),,(),( 21 σσσσ tEXtEXtEXtC I=
σ=large: Global feature of the trajectoryσ=small: Local feature of the trajectory σ=small
(cf. Mokahatan et al. (1986))
In: modified Bessel function of order n
8
Segment Matching based on Concave/convex Structures
Segment: partial trajectory between inflection points
Curvature at scale σ(2D case)
Inflection point:
Segment representation
),( σtc j
)0(1a)0(
2a)0(A
)(σA
2/322
21
2121
)(),(
XEXEXEXEXEXEtK
′+′′′′+′′′
=σ
),()(),(),( )()( σσσ tgtext
tEXtEX mim
im
mi ⊗=
∂∂
=
0),(),1(:),( <×− σσσ tKtKtC j
{ }NiaA i ,...,2,1|)()( == σσ
(cf .Ueda et al. (1990)
σ=large
σ=small
9
Multiscale Structural Comparison Global Matching Criteria
Minimization of total segment dissimilarity Complete match; the original trajectory must be formed
without gaps/overlaps by concatenating the segments Dissimilarity between two segments
)(hjbSegment
)(hbj
θ
)()(2)()(2)()()()( ),( hb
ka
hb
ka
hb
ka
jh
ki jijiji
vvggbad −+−+−= θθ
gradient rotation angle velocity
)(kiaSegment
)(kai
θ)(
)()(
ka
kak
a
i
i
i nl
v =(length)
(# of points)
))()(( )()( jh
ki bcac +×+ γ
replacement cost
)(kai
g )(hbi
g)(hb j
v
)()( , hj
ki ba),( )()( h
jk
i bad
10
Value-based Dissimilarity of Trajectories
After structural matching, calculate the value-based dissimilarity for each pair of matched segments
dv1(ap,bp) = peak difference+(left diff. + right diff.)/2
Attribute 1 dissimilarity
dv2(ap,bp) = peak difference+(left diff. + right diff.)/2
Attribute 2 dissimilarity
Trajectory A
Attr.1
Attr.2Attr.1
Attr.2
∑=
=P
pppvalval bad
PBAD
1
)0()0( ) ,(1),(
Trajectory B
CoG
22
21
)0()0( ) ,( vvppval ddbad +=cost+
11
Experiment 1: ASL Data Dataset: Australia sign lang. dataset in UCI KDD archive
Time-series data on the hand positions (3D) collected from 5 signers during performance of sign language.
Used for experimental validation by Vlachos et al. in ICDE02 (as 2D trajectory) and Keogh et al. in KDD00 (as 1D time-series)
For each signers, two to five sessions were conducted. In each session, five sign samples were recorded for each of the 95 words.
The length of each sample was different and typically contained about 50-150 time points.
signer A signer E
session 1 session n
word 1 word 95
sample 1 sample 5
session n
word 95
sample 1 sample 5Examples of“Norway”
12
Experiment 1: ASL Data Experimental Procedure
Out of the 95 signs (words), select the following 10 signs: Norway, cold, crazy, eat, forget, happy, innocent, later, lose, spend.
Select a pair of words such as {Norway, cold}. For each word, there exist 5 sign samples; therefore a total of 10 samples are selected.
Calculate the dissimilarities for each pair of the 10 samples by the proposed method.
Construct two groups by applying average-linkage hierarchical clustering.
Evaluate whether the samples are grouped correctly.
word 1 (“Norway”)
sample 1 sample 5
word 2 (“cold”)
sample 1 sample 5
pairwise comparison & grouping (into two clusters)evaluate whether groups are correct or not
Apply this procedurefor every pair of 10 words(total 45 pairs /session)
13
Experiment 1: ASL Data Results
According to Vlachos et al., the results by the Euclidean dist., DTW, and LCSS were 0.333 (15/45), 0.444 (20/45), and 0.467 (21/45). Signer/session info was not available on the paper.
Session # of correct pairs ratioandrew2 26/ 45 0.578john2 34/ 45 0.756john3 29/ 45 0.644john4 30/ 45 0.667stephen2 38/ 45 0.844stephen4 29/ 45 0.644waleed1 33/ 45 0.733waleed2 36/ 45 0.800waleed3 25/ 45 0.556waleed4 26/ 45 0.578
(worst)
(best)
Background for 2nd Expermeint
• Hospital Information System (1980’s- )• Computerization of All Hospital Information• Large-Scale Databases
• Data: Order and its Record: 1Order ≈ 3 to 5 Trans.• All the clinical actions are described as “orders”• Prescription
• Doctor → (Order) → Pharmacist• Laboratory Examination
• Doctor → (Order) → Laboratory
Background: HIS (2)
• Hospital Information System• Computerization of “Orders”• Results of Orders
• Data for Clinical Actions
• Reuse of Stored Data• Laboratory Examinations, Prescriptions,…
• They are “results from orders”• History of Orders: History of Clinical Actions
• Data-centric Hospital Management
Background: HIS (3)
• How many orders are made every day ?
• A Case: Shimane University Hospital • 616 beds, 1000 for outpatient clinic
• #Orders: about 8000• Prescription: 700, Injection: 700• Actions (Doctors & Nurses): 4300
• Storage of Data : 100MB /day • 30GB / year (cf. Image: 2.5TB/ year)
Chronology of #Orders(2008.6.1~6.7)
Sun
Mon TueWed
Thr
FriFri
Sat
Chronology of #Orders(2008.6.2)
Descriptions
NurseryDocuments
#Login 2008/6/2~2008/6/7
OutpatientClinic
Wards
Reuse of Data
• Understanding Dynamic Behavior of Hospital , Doctors and Patients : Temporal Data Mining
•Reuse of “Orders”• Analysis of Clinical Actions• Data Mining for Temporal Behaviors of Hospital or Medical Staff
• New type of Hospital Management
Co-occurrence of #Orders(2008.6.2)
Records
ReservationsPrescription
ExaminationaMorning
Afternoon
Experiment 2 : Data of #Orders
Data # of Orders for Each Day (2008.6.2~6.7)
Objective Find groups of similar trajectories Analyze the relationships between the grouped trajectories
Method Generate a dissimilarity matrix using the proposed method Perform cluster analysis using dendrograms generated by
hierarchical clustering method Results
2 Major Groups: Outpatient/Ward + Ward
Clustering Results
Visualization for Clusters
Records + Reservations
Prescriptions, Examinations, Radiology, Reservations
Outpatient
WardsRecords
Reservations
MorningAfternoon
Records and Nursery (Wards)
Nursery and Injections
Outpatient
Wards
Records
Nursery
Afternoon
Morning
27
Conclusions Presented a new method for trajectory mining
Trajectory representation -> multiscale, structural comparison -> value-based dissimilarity -> clustering
Application to Australia Sign Language Dataset Correct grouping ratio: 0.556 (worst), 0.844 (best) High robustness to noise
Application to Hopsital Data Two Groups of Behavior of #Orders: Outpatient, Ward Captured the Macroscopic Behavior of the UniversityHospital
Future work Extention to Multidimensional Trajectories
28
Preliminary Results (3D) Matching Results for 3-D Trajectories
29