patterns in sequences and data streams carlo zaniolo computer science department ucla

13
Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA

Upload: anne-poole

Post on 17-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA

Patterns in Sequences and Data Streams

Carlo ZanioloComputer Science Department

UCLA

Page 2: Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA

E-Commerce Applications

ZAIAS Corp: decision support and e-Services for web-based auctions—technology to: monitor multiple ongoing auctions, determine the right price for auctioned goods, & secure well-priced items by timely bids.

Sophisticated analysis of sequence patterns and instant response is needed for first two points.

Similar requirements in many other applications

Page 3: Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA

Sequence Analysis: many applications

Mining web access logs for: Customer characterization/segmentation Target advertising—banner optimization

Frequent sequences in market baskets Analysis of stock market trends

Double bottoms and similar patterns

Fraud detection examples examples Stolen credit cards, theft of cellular phones and

user ids,

Intrusion Detection: Example: A peripatetic intruder that attempts

successive logins from proximity workstations---spatio-temporal criteria used to detect such an attack

Page 4: Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA

State of The Art

ADT (e.g.. Informix Datablades): Not flexible enough, no Optimization In particular not suitable for infinite data

streams

SEQ: Enhanced ADTs (e g. sets and sequences) with their own query language

SRQL: Adding sequence algebra operators to relational model

Page 5: Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA

SQL-TS

A query language for finding complex patterns in sequences

Completely based on SQL---Minimal extensions, only the from clause affected

A powerful query optimization technique based on extensions of the Knuth, Morris & Pratt (KMP) string-search algorithm

Page 6: Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA

Example in Mining Weblogs

Consider a table or a stream of tuples:

Sessions(SessNo, ClickTime, PageNo, PageType)

That keeps track of pages visited in a session (sequence of requests from the same user)

Page types include: content (‘c’)description of product (‘d ’)purchase (‘p ’)

Ads a web merchant dream of: c, d, p

Page 7: Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA

3 clicks for a purchase

SQL-TS queries to find the ideal 3-click scenario

SELECT B.PageNo, C.ClickTime

FROM Sessions

PARTITION BY SessNo

ORDER BY ClickTime

AS (A, B, C)

WHERE A.PageType=‘c’

AND B.PageType=‘d’

AND C.PageType=‘p’

PARTITION BY and SEQUENCE BY were used in the original paper

Page 8: Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA

Credit Card Spending

Consider a Table log that keeps track of credit card transactions:

Spending(Date, AccountNo, Amount)

A surge in average spending might be sign of credit card theft.

Page 9: Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA

Credit Card Fraud Detection in SQL-TS

Track 30-day average spending and when it increases considerably for two consecutive days:

Select Z.AccountNo, Z.Date

FROM Spending PARTITION BY AccountNo SEQUENCE BY DateAS (X+, Y, Z)

WHERE COUNT(X+)=30AND Y.Amount> 5*AVG(X+.Amount)AND Z.Amount > 5*AVG(X+.Amount)

+X denotes 1 or more occurrences of X Aggregates can be computed on the stars

Page 10: Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA

Example in Online Auction

stream containing ongoing bids:

Bids(auctn_id, Amount, Time)

Table describing auctions:

auctions(auctn_id, item_id, min_bid, deadline,…)

Find bids that are converging to a fixed price

during the last 15 minutes of the auction.

Page 11: Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA

Example in Online Auction in SQL-TS

Find three successive bids that raise the last bid by less than 2% during the last 15 minutes of auction:

SELECT T.auction_id, T.time, T.amount FROM Auctions AS A,

Bids PARTITION BY auctn_id ORDER BY time AS (X, Y, Z, T)

WHERE A.auctn_id=X.auctn_idAND X.time + 15 Minute > A.deadlineAND Y.amount - X.amount < 0.02*X.amountAND Z.amount - Y.amount < 0.02*Y.amountAND T.amount - Z.amount < 0.02*Z.amount

Page 12: Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA

Conclusion

SQL-TS is a simple but powerful SQL extension to searche for pattern in sequences and time series.

QL-TS is supported by powerful query optimization techniques based on a generalization of the Knuth, Morris and Pratt text search algorithm.

Page 13: Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA

References

1. Reza Sadri, Carlo Zaniolo, Amir Zarkesh, Jafar Adibi: Expressing and optimizing sequence queries in database systems. ACM Transactions on Database Systems (TODS)Volume 29 , Issue 2 (June 2004)

2. Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi: Optimization of Sequence Queries in Database Systems. PODS 2001.

3. Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi: A Sequential Pattern Query Language for Supporting Instant Data Minining for e-Services, VLDB 2001.

4. R. Sadri, Optimization of Sequence Queries in Database Systems Ph.D. Thesis, UCLA, 2001.

5. P. Seshadri, M. Livny, and R. Ramakrishnan. SEQ: A model for sequence databases. In ICDE,, 1995.

6. P. Seshadri, M. Livny, and R. Ramakrishnan. Sequence query processing. ACM SIGMOD Conference on Management of Data,, May 1994