Part 2: Temporal Data-related IssuesPart 2: Temporal Data-related Issues
Dr. Vo Thi Ngoc ChauDr. Vo Thi Ngoc Chau([email protected])([email protected])
February 13, 2014
Ho Chi Minh City University of TechnologyHo Chi Minh City University of Technology
Faculty of Computer Science and EngineeringFaculty of Computer Science and Engineering
Temporal Databases: Temporal Databases: Principles and PracticesPrinciples and Practices
Suphamit Chittayasothorn and Vo Thi Ngoc ChauSuphamit Chittayasothorn and Vo Thi Ngoc Chau
Content
A Quick Review
Implementations for Temporal Support
Temporal Data Aggregation
Temporal Data Mining
Conclusion
2
A Quick Review- Temporal Database Concepts
Temporal aspect Valid Time Transaction Time User-defined Time
Definition Time when the fact is true in reality
Time when the fact is current in the database
Time uninterpreted and parallel to data domains like NUMBER
Provided by User System User
Supported by System System User
Special language for semantics
Yes Yes No
Temporal Database - a database that supports some aspect of time,
not counting user-defined time (e.g, date_of_birth).
Temporal Language – Extensions to QUEL, SQL, OQL
3
A Quick Review- The Handling of Temporal Data
Object/tuple timestamping vs. Attribute timestamping
1000‘ON1’
800‘ON1’
800‘ON2’
1200‘ON1’
Valid Time
r1
[2000, 2003) [2003, 2004) [2004, 2006) [2006, 9999)
4
Object/tuple timestamping
[2000, 2003)
[2003, 2004)
[2004, 2006)
[2006, 9999)
1000 [2000, 2003)
800 [2003, 2004)
800 [2004, 2006)
1200 [2006, 9999)
‘ON1’ [2000, 2003)
‘ON1’ [2003, 2004)
‘ON2’ [2004, 2006)
‘ON1’ [2006, 9999)
Area
Owner
Shaper1
r1
r1
r1
r1
r1
r1
r1
r1
r1
r1
r1
DATA REDUNDANCY
+ DATA SPLITTING
+ TEMPORAL DATA ANOMALIES
A Quick Review- The Handling of Temporal Data
5
Attribute timestamping
[2000, 2003)
[2003, 2004)
[2004, 2006)
[2006, 9999)
1000 [2000, 2003)
800[2003, 2004)
[2004, 2006)
1200 [2006, 9999)
‘ON1’
[2000, 2003)
[2003, 2004)
[2006, 9999)
‘ON2’ [2004, 2006)
Area Owner Shape
r1
[2000, 2003)
[2003, 2004)
[2004, 2006)
[2006, 9999)
900 [2000, 2003)
700[2003, 2004)
[2004, 2006)
1100 [2006, 9999)
‘ON3’
[2000, 2003)
[2003, 2004)
[2006, 9999)
‘ON4’ [2004, 2006)
r2
A Quick Review- The Handling of Temporal Data
6
Implementations for Temporal Support
Temporal database system implementation
Implementation from scratch
Implementation on some existing DBMS: layered or integrated
Existing DBMS
Temporal SupportScanner
Parser
Code Generator
Output ProcessorMetadata Management
Temporal Statement S
Non-Temporal Statement S’
ResultError
A layered approach
Existing DBMS TemporalSupport
Temporal Statement S ResultError
An integrated approach
7
Implementations for Temporal Support
An example of temporal database system implementation Our proposed temporal compatible object relational database
system on Oracle 10g using the integrated approach
Valid time
Attribute timestamping
A temporal transparency environment
More at:
Vo Thi Ngoc Chau, Suphamit Chittayasothorn, “A Temporal Compatible Object Relational Database System”, in Proceedings of SOUTHEASTCON, 2007.
Vo Thi Ngoc Chau, Suphamit Chittayasothorn, “A Temporal Object Relational SQL Language with Attribute Timestamping in a Temporal Transparency Environment”, Data & Knowledge Engineering (2008), doi: 10.1016/j.datak.2008.06.008.
8
Our Proposed Temporal Compatible Object Relational Database System
Temporal Database
Non-Temporal ORDBMS
Temporal Extension
Existing Application Programming Interface Existing End-User Interface
TemporalApplication Programs
Non-temporalApplication Programs
TemporalEnd-users
Non-temporalEnd-users
SQL wrapped inTranslation
Stored-Procedures
SQL wrapped inTranslation
Stored-Procedures
SQL withTemporal Types
and Methods
Temporal SQLwrapped inTranslation
Stored-Procedures
ApplicationPrograms
Temporal SQLwrapped inTranslation
Stored-Procedures
SQL withTemporal Types
and Methods
End-users
9
Our Proposed Temporal Compatible Object Relational Database System
Temporal Database
Non-Temporal ORDBMS
Temporal Extension
Existing Application Programming Interface Existing End-User Interface
TemporalApplication Programs
Non-temporalApplication Programs
TemporalEnd-users
Non-temporalEnd-users
SQL wrapped inTranslation
Stored-Procedures
SQL wrapped inTranslation
Stored-Procedures
SQL withTemporal Types
and Methods
Temporal SQLwrapped inTranslation
Stored-Procedures
ApplicationPrograms
Temporal SQLwrapped inTranslation
Stored-Procedures
SQL withTemporal Types
and Methods
End-users
10
Our Proposed Temporal Compatible Object Relational Database System
Temporal Database
Non-Temporal ORDBMS
Temporal Extension
Existing Application Programming Interface Existing End-User Interface
TemporalApplication Programs
Non-temporalApplication Programs
TemporalEnd-users
Non-temporalEnd-users
SQL wrapped inTranslation
Stored-Procedures
SQL wrapped inTranslation
Stored-Procedures
SQL withTemporal Types
and Methods
Temporal SQLwrapped inTranslation
Stored-Procedures
ApplicationPrograms
Temporal SQLwrapped inTranslation
Stored-Procedures
SQL withTemporal Types
and Methods
End-users
11
Temporal Database
Non-Temporal ORDBMS
Temporal Extension
Existing Application Programming Interface Existing End-User Interface
TemporalApplication Programs
Non-temporalApplication Programs
TemporalEnd-users
Non-temporalEnd-users
SQL wrapped inTranslation
Stored-Procedures
SQL wrapped inTranslation
Stored-Procedures
SQL withTemporal Types
and Methods
Temporal SQLwrapped inTranslation
Stored-Procedures
ApplicationPrograms
Temporal SQLwrapped inTranslation
Stored-Procedures
SQL withTemporal Types
and Methods
End-users
Upward Compatibility/Temporal Upward Compatibility
Our Proposed Temporal Compatible Object Relational Database System
12
Our Proposed Temporal Compatible Object Relational Database System
Temporal Database
Non-Temporal ORDBMS
Temporal Extension
Existing Application Programming Interface Existing End-User Interface
TemporalApplication Programs
Non-temporalApplication Programs
TemporalEnd-users
Non-temporalEnd-users
SQL wrapped inTranslation
Stored-Procedures
SQL wrapped inTranslation
Stored-Procedures
SQL withTemporal Types
and Methods
Temporal SQLwrapped inTranslation
Stored-Procedures
ApplicationPrograms
Temporal SQLwrapped inTranslation
Stored-Procedures
SQL withTemporal Types
and Methods
End-users
Sequenced Semantics
13
Our Proposed Temporal Compatible Object Relational Database System
Temporal Database
Non-Temporal ORDBMS
Temporal Extension
Existing Application Programming Interface Existing End-User Interface
TemporalApplication Programs
Non-temporalApplication Programs
TemporalEnd-users
Non-temporalEnd-users
SQL wrapped inTranslation
Stored-Procedures
SQL wrapped inTranslation
Stored-Procedures
SQL withTemporal Types
and Methods
Temporal SQLwrapped inTranslation
Stored-Procedures
ApplicationPrograms
Temporal SQLwrapped inTranslation
Stored-Procedures
SQL withTemporal Types
and Methods
End-users
Non-sequenced Semantics
14
Our Proposed Temporal Compatible Object Relational Database System
Temporal Database
Non-Temporal ORDBMS
Temporal Extension
Existing Application Programming Interface Existing End-User Interface
TemporalApplication Programs
Non-temporalApplication Programs
TemporalEnd-users
Non-temporalEnd-users
SQL wrapped inTranslation
Stored-Procedures
SQL wrapped inTranslation
Stored-Procedures
SQL withTemporal Types
and Methods
Temporal SQLwrapped inTranslation
Stored-Procedures
ApplicationPrograms
Temporal SQLwrapped inTranslation
Stored-Procedures
SQL withTemporal Types
and Methods
End-users
Interactive SQL Mode
15
Our Proposed Temporal Compatible Object Relational Database System
Temporal Database
Non-Temporal ORDBMS
Temporal Extension
Existing Application Programming Interface Existing End-User Interface
TemporalApplication Programs
Non-temporalApplication Programs
TemporalEnd-users
Non-temporalEnd-users
SQL wrapped inTranslation
Stored-Procedures
SQL wrapped inTranslation
Stored-Procedures
SQL withTemporal Types
and Methods
Temporal SQLwrapped inTranslation
Stored-Procedures
ApplicationPrograms
Temporal SQLwrapped inTranslation
Stored-Procedures
SQL withTemporal Types
and Methods
End-users
Embedded SQL Mode
16
Implementations for Temporal Support
Built-in temporal support in the SQL standards and DBMSs
Part 7 SQL/Temporal was withdrawn.
More at: ISO/IEC JTC 1/SC 32 N 0436, Rationale for the Withdrawal of Projects, 2000.
Built-in date-time related data types and methods in many existing DBMSs
MySQL, Oracle, MS SQL Server, IBM DB2, Informix, …
17
Implementations for Temporal Support
Built-in temporal support in the SQL standards and DBMSs Valid time support with tuple timestamping in Oracle
10g/11g Workspace Manager More at: Oracle Valid Time Support, Application developer’s
guide – workspace manager, 10g release 1(10.1), No. B10824-01.
Transaction time support in Oracle 10g/11g Flashback Query More at: Oracle Flashback Query, Application developer’s guide
– fundamentals, 10g release 1(10.1), No. B10795-01.
Transaction time support in MS SQL Server Immortal DB More at: D. Lomet, R. Barga, M. Mokbel, G. Shegalov, R. Wang,
Y. Zhu, Transaction time support inside a database engine, in: Proc. ICDE Conference, (IEEE, 2006) 35-46.
18
Implementations for Temporal Support
Built-in temporal support in Oracle
Data types and methods related to date and time SQL standards
Oracle 10g/11g
Oracle valid time support Concepts
The WM_PERIOD Data Type
Data Definitions with Valid Time Support
Data Querying with Valid Time Support
Data Modifications with Valid Time Support19
Data types and methods related to date and time in SQL standards
Datetime data types
DATE, TIME WITHOUT TIME ZONE, TIMESTAMP WITHOUT TIME ZONE, TIME WITH TIME ZONE, TIMESTAMP WITH TIME ZONE
Fields: YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, TIMEZONE_HOUR, TIMEZONE_MINUTE
Interval data type
INTERVAL
Fields: YEAR, MONTH, HOUR, MINUTE, SECOND
20
Data types and methods related to date and time in SQL standards
Operations involving datetimes and intervals Arithmetic operators: -, +, *, /
<overlaps predicate>: the operator OVERLAPS to determine if two chronological periods overlap in time
<current date value function>: CURRENT_DATE
<current time value function>: CURRENT_TIME
<extract expression>: functions on a datetime or interval to return an exact numeric value representing the value of one component (field) of the datetime or interval
… 21
Data types and methods related to date and time in Oracle 10g/11g
Datetime data types
DATE, TIMESTAMP, TIMESTAMP WITH TIME ZONE, TIMESTAMP WITH LOCAL TIME ZONE
Fields: YEAR, MONTH, DAY, HOUR, MINUTE, SECOND, TIMEZONE_HOUR, TIMEZONE_MINUTE, TIMEZONE_REGION, TIMEZONE_ABBR
Interval data types
INTERVAL YEAR TO MONTH, INTERVAL DAY TO SECOND
22
Data types and methods related to date and time in Oracle 10g/11g
Datetime arithmetic operations: +, -, *, /
Datetime functions: ADD_MONTHS, CURRENT_DATE, CURRENT_TIMESTAMP, DBTIMEZONE, EXTRACT(datetime), FROM_TZ, LAST_DAY, LOCALTIMESTAMP, MONTHS_BETWEEN, NEW_TIME, NEXT_DAY, NUMTODSINTERVAL, NUMTOYMINTERVAL, ROUND(date), SESSIONTIMEZONE, SYS_EXTRACT_UTC, SYSDATE, SYSTIMESTAMP, TO_CHAR(datetime), TO_TIMESTAMP, TO_TIMESTAMP_TZ, TO_DSINTERVAL, TO_YMINTERVAL, TRUNC(date), TZ_OFFSET
23
Oracle valid time support
So-called Workspace
Concepts
The WM_PERIOD Data Type
Data Definitions with Valid Time Support
Data Querying with Valid Time Support
Data Modifications with Valid Time Support
24
So-called Workspace A workspace is a virtual environment that one or more users can share
to make changes to the data in the database.
A workspace logically groups collections of new row versions from one or more version-enabled tables, and isolates these versions until they are explicitly merged with production data or discarded, thus providing maximum concurrency.
Users in a workspace always see a transactionally consistent view of the entire database; that is, they see changes made in their current workspace plus the rest of the data in the database as it existed either when the workspace was created or when the workspace was most recently refreshed with changes from the parent workspace.
The history option enables you to timestamp changes made to all rows in a version-enabled table and to save a copy of either all changes or only the most recent changes to each row. If you keep all changes (specifying the “without overwrite” history option) when version-enabling a table, you keep a persistent history of all changes made to all row versions, and enable users to go to any point in time to view the database as it existed from the perspective of that workspace.
25
Concepts
The valid time support for version-enabled tables
Valid time: the validity of the data
Each record is valid only within the time range associated with the record.
Each row contains an added column to hold the valid time period associated with the row.
A valid time range is specified and queries, insert, update, and delete operations reflect and accommodate the valid time range.
26
Concepts
Version-enabled tables All rows in a version-enabled table can support
multiple versions of the data.
The history option enables timestamp changes to be made to all rows in a version-enabled table and to save a copy of either all changes or only the most recent changes to each row.
If all changes (specifying the “without overwrite” history option) are kept when a table is version-enabled, a persistent history of all changes made to all row versions is kept and users are enabled to go to any point in time to view the database as it existed from the perspective of that workspace.
27
Concepts
Version-enabled tables Referential integrity and uniqueness constraints
DDL operations related to version-enabled tables Enclosed by the BeginDDL procedure and the CommitDDL
or RollbackDDL procedure DDL statements on <table_name>_LTS instead of
<table_name>
Querying is carried out as usual.
Modification operations (insert, delete, and update) on version-enabled tables
Carried out as usual
28
Examples
29
Examples
30
Examples
31
The WM_PERIOD Data Type
Used to specify a valid time range for the session or workspace, and for a row in a version-enabled table
Each instance of the WM_PERIOD data type is a closed-open period of time [validFrom, validTill) where validTill is excluded.
32
The WM_PERIOD Data Type
Operators on instances of the WM_PERIOD data type
The relationship checking operators (input: period1, period2; output: 1=TRUE, 0 = FALSE)
WM_OVERLAPS, WM_CONTAINS, WM_MEETS, WM_EQUALS, WM_LESSTHAN, WM_GREATERTHAN
The set operators (input: period1, period2; output: period3)
WM_INTERSECTION, WM_LDIFF, WM_RDIFF
33
Data Definitions with Valid Time Support
Enable valid time support when version-enabling a table
Add valid time support to an existing version-enabled table
Constraints on version-enabled tables that have valid time support
34
Data Definitions with Valid Time Support
Enable valid time support when version-enabling a table
35
Data Definitions with Valid Time Support Add valid time support to an existing version-enabled table
36
Data Definitions with Valid Time Support Constraints on version-enabled tables that have
valid time support Referential integrity constraints
If both with valid time support, an insert or update operation on the referring table will fail if the valid time associated with the new value at the referring column is not within the valid time associated with the value at the referred column of the referred table.
If either or both with no valid time support, valid time periods are ignored in enforcing the constraint.
Unique constraints Given an insert or update operation on a version-enabled table
with valid time support and a unique constraint on one or many columns
If the existing and inserted rows have the same value combination at these constrained columns, their WM_VALID values do not overlap.
37
Data Definitions with Valid Time Support
Constraints on version-enabled tables that have valid time support Referential integrity constraints
For example, given two DEPARTMENTS and EMPLOYEES tables, the DEPARTMENTS.MANAGER_ID column is a foreign key referencing the EMPLOYEES.EMPLOYEE_ID column.
Consider an insert or update operation with a new DEPARTMENTS.MANAGER_ID value.
The operation will fail if the DEPARTMENTS.WM_VALID value is not within the range of the EMPLOYEES.WM_VALID value for the employee who is being made the department manager.
38
Data Definitions with Valid Time Support
Constraints on version-enabled tables that have valid time support Unique constraints
For example, given an EMPLOYEES table with a unique constraint on an EMPLOYEE_ID column.
Consider an insert or update operation with a new EMPLOYEE_ID value.
If the new EMPLOYEE_ID value is the same as an existing EMPLOYEE_ID value, the operation will fail if the WM_VALID values of the existing and inserted rows overlap.
39
Data Querying with Valid Time Support
Invoke the SetValidTime or SetValidTimeFilterOn procedure with some valid time range
Rows with the valid time that overlap the specified valid time range are taken into account in the evaluation of a query.
Do not invoke or invoke the SetValidTime procedure with no parameters
Rows that are valid at the current time are considered in the evaluation of a query.
Operators are explicitly specified on the WM_VALID column of version-enabled tables with the valid time support the non-sequenced manner
40
Data Querying with Valid Time Support
41
Data Querying with Valid Time Support
42
Data Querying with Valid Time Support
43
Data Querying with Valid Time Support
44
Data Modifications with Valid Time Support
INSERT statements
Specify a valid time period for a new row the value at the WM_VALID column
If NULL for the WM_VALID column, the session valid time period or [NOW, NULL) is used no concept of “temporal upward compatibility”.
Primary key and referential integrity constraints are checked.
45
Data Modifications with Valid Time Support
INSERT statements
46
Data Modifications with Valid Time Support
DELETE statements
A sequenced delete operation deletes the portion of a row that falls within the session valid time range.
No concept of a non-sequenced delete operation
If a valid time was not set, all rows satisfying the WHERE condition if any are considered for the delete operation incorrect description
If a valid time was not set, all rows that are valid at the current time are considered correct from a test
47
Data Modifications with Valid Time Support
DELETE statements – before deletion
48
Data Modifications with Valid Time Support
DELETE statements – a sequenced deletion
49
Data Modifications with Valid Time Support
DELETE statements – after a sequenced deletion
50
Data Modifications with Valid Time Support
DELETE statements – test a non-sequenced deletion
51
Data Modifications with Valid Time Support UPDATE statements
A sequenced update operation: no change is specified to the WM_VALID column in the UPDATE statement
The WM_VALID.ValidTill value for an updated row is changed to the ValidFrom timestamp of the current session valid time range, and a new row is created in which the WM_VALID period reflects the current session valid time range. only for the overlapping case; this is not generally applied to other cases (disjoint, within, equal, contains)
A non-sequenced update operation: a change is explicitly specified to the WM_VALID column in the UPDATE statement and the rows that are valid at the current time are deleted.
52
Data Modifications with Valid Time Support
UPDATE statements
53
Data Modifications with Valid Time Support
UPDATE statements
54
Data Modifications with Valid Time Support
UPDATE statements
55
Data Modifications with Valid Time Support
UPDATE statements
56
Data Modifications with Valid Time Support
UPDATE statements
57
New Oracle Database 12c temporal support
Examples from “Oracle® Database Development Guide, 12c Release 1 (12.1), E17620-11”
58
empno last_name start_time end_time
New Oracle Database 12c temporal support
59
empno last_name start_time end_time
100 Ames 01-Jan-10 30-Jun-11
101 Burton 01-Jan-11 30-Jun-11
102 Chen 01-Jan-12
New Oracle Database 12c temporal support
60
empno last_name start_time end_time
100 Ames 01-Jan-10 30-Jun-11
101 Burton 01-Jan-11 30-Jun-11
102 Chen 01-Jan-12
61
empno last_name start_time end_time
100 Ames 01-Jan-10 30-Jun-11
101 Burton 01-Jan-11 30-Jun-11
102 Chen 01-Jan-12
Summary on Oracle’s Support Tuple timestamping
Each row is associated with a period of time.
Constraints: Referential integrity constraints, Uniqueness constraints
No sequenced queries Operators on the WM_VALID column are explicitly invoked by
users.
Sequenced deletions Entire rows with the WM_VALID value that overlaps the
session valid time range are deleted. A sequenced deletion should affect only the one in common, the others should be kept as-is.
More new Oracle Database 12c temporal support at: www.oracle.com
62
Given a valid-time table - Employee
REFC
Name
Gender D_birth
Salary Dept
vtValueValidTime
vtValueValidTime
vtValueValidTime
vtStart vtEnd vtStart vtEnd vtStart vtEnd
E1
Ed 2/1/1982 1/1/1988
M 7/1/1955
20 2/1/1982 6/1/1982D1 2/1/1982 2/1/1987
30 6/1/1982 2/1/1985
Edward 1/1/1988 12/31/9999 402/1/1985 2/1/1987
D2 4/1/1987 12/31/99994/1/1987 12/31/9999
E2 Di 1/1/1982 12/31/9999 F 10/1/1960
30 1/1/1982 8/1/1984
D1 1/1/1982 12/31/999940 8/1/1984 9/1/1986
50 9/1/1986 12/31/9999
E3John 1/1/1962 1/1/1978
M 1/1/196245 1/1/1984 1/1/1989
D4 1/1/1984 12/31/9999Johnson 1/1/1978 12/31/9999 55 1/1/1989 12/31/9999
E4 Jack 5/15/1950 12/31/9999 M 5/15/195040 1/1/1980 1/1/1984 D4 1/1/1980 1/1/1984
50 1/1/1984 12/31/9999 D3 1/1/1984 12/31/9999
E5 White 12/1/1960 12/31/9999 F 12/1/1960
30 1/1/1980 1/1/1984 D3 1/1/1980 1/1/1984
40 1/1/1984 1/1/1989D4 1/1/1984 12/31/9999
45 1/1/1989 12/31/9999
Temporal Data Aggregation
63
REFC
Name
Gender D_birth
Salary Dept
vtValueValidTime
vtValueValidTime
vtValueValidTime
vtStart vtEnd vtStart vtEnd vtStart vtEnd
E1
Ed 2/1/1982 1/1/1988
M 7/1/1955
20 2/1/1982 6/1/1982D1 2/1/1982 2/1/1987
30 6/1/1982 2/1/1985
Edward 1/1/1988 12/31/9999 402/1/1985 2/1/1987
D2 4/1/1987 12/31/99994/1/1987 12/31/9999
E2 Di 1/1/1982 12/31/9999 F 10/1/1960
30 1/1/1982 8/1/1984
D1 1/1/1982 12/31/999940 8/1/1984 9/1/1986
50 9/1/1986 12/31/9999
E3John 1/1/1962 1/1/1978
M 1/1/196245 1/1/1984 1/1/1989
D4 1/1/1984 12/31/9999Johnson 1/1/1978 12/31/9999 55 1/1/1989 12/31/9999
E4 Jack 5/15/1950 12/31/9999 M 5/15/195040 1/1/1980 1/1/1984 D4 1/1/1980 1/1/1984
50 1/1/1984 12/31/9999 D3 1/1/1984 12/31/9999
E5 White 12/1/1960 12/31/9999 F 12/1/1960
30 1/1/1980 1/1/1984 D3 1/1/1980 1/1/1984
40 1/1/1984 1/1/1989D4 1/1/1984 12/31/9999
45 1/1/1989 12/31/9999
Temporal Data Aggregation
Given a valid-time table - Employee
64
an attribute history
a temporal composition
time-dependent component
time component
Temporal Data Aggregation
Challenge yourselves with temporal data aggregation How to provide built-in temporal support
for users?
How to make use of available built-in support from DBMS?
Let’s try with different existing DBMSs!
65
List the highest salary at each department at present.
66
REFC
Name
Gender D_birth
Salary Dept
vtValueValidTime
vtValueValidTime
vtValueValidTime
vtStart vtEnd vtStart vtEnd vtStart vtEnd
E1
Ed 2/1/1982 1/1/1988
M 7/1/1955
20 2/1/1982 6/1/1982D1 2/1/1982 2/1/1987
30 6/1/1982 2/1/1985
Edward 1/1/1988 12/31/9999 402/1/1985 2/1/1987
D2 4/1/1987 12/31/99994/1/1987 12/31/9999
E2 Di 1/1/1982 12/31/9999 F 10/1/1960
30 1/1/1982 8/1/1984
D1 1/1/1982 12/31/999940 8/1/1984 9/1/1986
50 9/1/1986 12/31/9999
E3John 1/1/1962 1/1/1978
M 1/1/196245 1/1/1984 1/1/1989
D4 1/1/1984 12/31/9999Johnson 1/1/1978 12/31/9999 55 1/1/1989 12/31/9999
E4 Jack 5/15/1950 12/31/9999 M 5/15/195040 1/1/1980 1/1/1984 D4 1/1/1980 1/1/1984
50 1/1/1984 12/31/9999 D3 1/1/1984 12/31/9999
E5 White 12/1/1960 12/31/9999 F 12/1/1960
30 1/1/1980 1/1/1984 D3 1/1/1980 1/1/1984
40 1/1/1984 1/1/1989D4 1/1/1984 12/31/9999
45 1/1/1989 12/31/9999
DEPT_ID SAL_MAX
D2 40
D1 50
D4 55
D3 50
REFC
Name
Gender D_birth
Salary Dept
vtValueValidTime
vtValueValidTime
vtValueValidTime
vtStart vtEnd vtStart vtEnd vtStart vtEnd
E1
Ed 2/1/1982 1/1/1988
M 7/1/1955
20 2/1/1982 6/1/1982D1 2/1/1982 2/1/1987
30 6/1/1982 2/1/1985
Edward 1/1/1988 12/31/9999 402/1/1985 2/1/1987
D2 4/1/1987 12/31/99994/1/1987 12/31/9999
E2 Di 1/1/1982 12/31/9999 F 10/1/1960
30 1/1/1982 8/1/1984
D1 1/1/1982 12/31/999940 8/1/1984 9/1/1986
50 9/1/1986 12/31/9999
E3John 1/1/1962 1/1/1978
M 1/1/196245 1/1/1984 1/1/1989
D4 1/1/1984 12/31/9999Johnson 1/1/1978 12/31/9999 55 1/1/1989 12/31/9999
E4 Jack 5/15/1950 12/31/9999 M 5/15/195040 1/1/1980 1/1/1984 D4 1/1/1980 1/1/1984
50 1/1/1984 12/31/9999 D3 1/1/1984 12/31/9999
E5 White 12/1/1960 12/31/9999 F 12/1/1960
30 1/1/1980 1/1/1984 D3 1/1/1980 1/1/1984
40 1/1/1984 1/1/1989D4 1/1/1984 12/31/9999
45 1/1/1989 12/31/9999
List the number of employees whose names are always different from ‘John’ and their highest salaries along time
67
EMP#
SAL_MAX
vtValueValidTime
vtStart vtEnd
440 1/1/1980 1/1/1984
50 1/1/1984 12/31/9999
For each gender, list the number of employees and the highest salary that employees have been paid along time.
68
REFC
Name
Gender D_birth
Salary Dept
vtValueValidTime
vtValueValidTime
vtValueValidTime
vtStart vtEnd vtStart vtEnd vtStart vtEnd
E1
Ed 2/1/1982 1/1/1988
M 7/1/1955
20 2/1/1982 6/1/1982D1 2/1/1982 2/1/1987
30 6/1/1982 2/1/1985
Edward 1/1/1988 12/31/9999 402/1/1985 2/1/1987
D2 4/1/1987 12/31/99994/1/1987 12/31/9999
E2 Di 1/1/1982 12/31/9999 F 10/1/1960
30 1/1/1982 8/1/1984
D1 1/1/1982 12/31/999940 8/1/1984 9/1/1986
50 9/1/1986 12/31/9999
E3John 1/1/1962 1/1/1978
M 1/1/196245 1/1/1984 1/1/1989
D4 1/1/1984 12/31/9999Johnson 1/1/1978 12/31/9999 55 1/1/1989 12/31/9999
E4 Jack 5/15/1950 12/31/9999 M 5/15/195040 1/1/1980 1/1/1984 D4 1/1/1980 1/1/1984
50 1/1/1984 12/31/9999 D3 1/1/1984 12/31/9999
E5 White 12/1/1960 12/31/9999 F 12/1/1960
30 1/1/1980 1/1/1984 D3 1/1/1980 1/1/1984
40 1/1/1984 1/1/1989D4 1/1/1984 12/31/9999
45 1/1/1989 12/31/9999
Gender EMP#
SAL_MAX
vtValueValidTime
vtStart vtEnd
M 3
40 1/1/1980 1/1/1984
50 1/1/1984 1/1/1989
55 1/1/1989 12/31/9999
F 2
30 1/1/1980 1/1/1984
40 1/1/1984 9/1/1986
50 9/1/1986 12/31/9999
List the highest salary at each department with less than 20 employees along time.
69
REFC …
Salary Dept
vtValue
ValidTimevtValue
ValidTime
vtStart vtEnd vtStart vtEnd
E1 …
20 2/1/1982 6/1/1982D1 2/1/1982 2/1/1987
30 6/1/1982 2/1/1985
402/1/1985 2/1/1987
D2 4/1/1987 12/31/99994/1/1987 12/31/9999
E2 …
30 1/1/1982 8/1/1984
D1 1/1/1982 12/31/999940 8/1/1984 9/1/1986
50 9/1/1986 12/31/9999
E3 …45 1/1/1984 1/1/1989
D4 1/1/1984 12/31/999955 1/1/1989 12/31/9999
E4 …40 1/1/1980 1/1/1984 D4 1/1/1980 1/1/1984
50 1/1/1984 12/31/9999 D3 1/1/1984 12/31/9999
E5 …
30 1/1/1980 1/1/1984 D3 1/1/1980 1/1/1984
40 1/1/1984 1/1/1989D4 1/1/1984 12/31/9999
45 1/1/1989 12/31/9999
DEPT_ID vtStart vtEnd
SAL_MAX
vtValue
ValidTime
vtStart vtEnd
D1 1/1/1982 12/31/9999
30 1/1/1982 8/1/1984
40 8/1/1984 9/1/1986
50 9/1/1986 12/31/9999
D2 4/1/1987 12/31/9999 40 4/1/1987 12/31/9999
D4 1/1/1980 1/1/1984 40 1/1/1980 1/1/1984
D4 1/1/1984 12/31/999945 1/1/1984 1/1/1989
55 1/1/1989 12/31/9999
D3 1/1/1980 1/1/1984 30 1/1/1980 1/1/1984
D3 1/1/1984 12/31/9999 50 1/1/1984 12/31/9999
Grouped by a temporal column
Temporal Data Mining
What is Data Mining?
Frequent Pattern Mining
70
What is Data Mining?
Obama campaign’s secret strategy – 2012Obama campaign’s secret strategy – 2012
Knowing your customersKnowing your customers
Predict final status of undergrad studentsPredict final status of undergrad students
Predict heart diseasePredict heart disease
Car classificationCar classification
Market analysisMarket analysis
……
71
72
What is Data Mining?D
ata
Data
Min
ing
Min
ing
Info
rmati
on/
Info
rmati
on/
Know
led
ge
Know
led
ge
What is Data Mining?
73
More at: Jiawei Han, Micheline Kamber, Jian Pei, “Data Mining: Concepts and Techniques”,
Third Edition, Morgan Kaufmann Publishers, 2012.
Data mining – searching for
knowledgeknowledge (interesting interesting patternspatterns) from large
amounts of data.
What is Data Mining?
74Data mining as a step in the process of knowledge discoveryknowledge discovery
Frequent pattern
mining is a mining task
Frequent Pattern Mining
75
Application domains
Market basket analysisAnalyzing temporal data encountered in Electronic Health Record systems
Predicting stock prices in the future Predicting seismic trends
Frequent Pattern Mining
76
Frequent patterns The patterns that occur frequently in data.
Milk and bread are frequently bought together in grocery stores [milk,
bread] is a frequent pattern.
A sequence of buying a PC, followed by a camera and then a
memory card frequently occurred
It is a frequent sequential pattern.
77
Example of interval-based data
Example of point-based data
Example of time series data
A time series - sequence of
numbers collected at regular
intervals over a period of
time.
Are they temporal data?Are they temporal data?
Frequent Pattern Mining
78
Database Frequent pattern
On point-based sequential databases
On interval-based sequential databases
On time series databases
On databases of multiple time series
Frequent Pattern Mining More at:
R. Agrawal, R. Srikant. Mining sequential patterns. In Proc. ICDE, 1995.
I. Batal, D. Fradkin, J. Harrison, F. Mörchen, M. Hauskrecht. Mining recent temporal patterns for event detection in multivariate time series data. In Proc. KDD, pp.280-288, 2012.
I. Batyrshin, L. Sheremetov, R. Herrera-Avelar. Perception based patterns in time series data mining. Studies in Computational Intelligence (SCI) 36 (2007) 85–118.
M. Lin, S. Lee. Fast discovery of sequential patterns through memory indexing and database partitioning. Journal of Information Science and Engineering 21 (2005) 109–128.
F. Mörchen, A. Ultsch. Efficient mining of understandable patterns from multivariate interval time series. Data Min Knowl Disc 15 (2007) 181-215.
J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, M-C. Hsu. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In Proc. the 17th International Conference on Data Engineering (ICDE), 2001.
79
Frequent Pattern Mining More at:
P. Papapetrou, G. Kollios, S. Sclaroff. Discovering frequent arrangements of temporal intervals. In Proc. ICDM, 2005.
E. Winarko, J. F. Roddick. ARMADA – An algorithm for discovering richer relative temporal association rules from interval – based data. Data & Knowledge Engineering 63 (2007) 76-90.
S-Y. Wu, Y-L. Chen. Mining nonambiguous temporal patterns for interval-based events. IEEE Transactions on Knowledge and Data Engineering 19 (2007) 742–758.
Q. Yang, X. Wu. 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making 5 (4) (2006) 597–604.
R. Sadasivam, K. Duraiswamy, Efficient approach to discover interval-based sequential patterns, Journal of Computer Science, 9 (2): 225-234, 2013.
T. B. T. Phan, T. N. C Vo, T. A. Duong – An efficient interval-based approach to mining frequent patterns in a time series database. In Proc. MIWAI’13, 2013. 80
Conclusion Time is ubiquitous.
Check if you can find some application domain where time does not exist
Consider temporal data from semantics to management to implementation
Let’s get started with some advanced DBMS where your temporal data can be supported the most
And now, a very very very … large amount of temporal data gathered over the time
What should we do next to make the most of such existing data?
Let’s try something with temporal data analysis and mining81
82