1 data mining at work krithi ramamritham. 2 dynamics of web data dynamically created web pages --...

Post on 19-Jan-2016

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Data Miningat work

Krithi Ramamritham

2

Dynamics of Web Data Dynamically created Web Pages

-- using scripting languages

Ad Component

Headline Component

Headline Component

Headline Component

Headline Component

Personalized Component

Navig

ati

on C

om

ponent

3

1. What to deliver?

Page content may be based on • queries on dynamically changing data

– e.g., sports scores, stock prices, environment

• type of access device• time and location of access/user

Existing sites may contain new information

New sites (URLs) may come into being

4

2. How to deliver?

Data sources

Proxies/caches

End-hosts

servers

sensors

wired host

mobile host

Netw

ork

Netw

ork

5

Keep Data Up-to-date

• Update Mumbai temperature every 2 degrees

• The proxy obtains data from the source(s)

• Maintains | | UU((tt) - ) - SS((tt) | <= ) | <= 22

SourceS(t)

Proxy / DBP(t)

UserU(t)

6

When to poll the source?

After a specific interval

Server Proxy UserPull

Based on temporal data mining – time series analysis – and prediction of when change will exceed 2 degrees

7

Where to do the work?

• Diverse client devices– Differ in hardware, software,

network connectivity,

form factor

• Web content needs to be tailored for each client type

Each response depends not only on the requested URL but also on the capabilities of the client

8

Transcoding

Conversion of one data version to another–Decreasing Image Quality (JPEG quality level) and size

- “convert” utility in Linux–Summarizing text

transcode =>

Info extraction/retrieval/

classification

9

Who should transcode?

1. Download desired version from server

2. Transcode higher version locally

• Factors influencing decision– Transcoding Complexity– Proxy-server network connection – Load on proxy

(Multiple Linear) Regression Predict based on a (linear) model of overheads

10

What is new on the Web?

How is the monsoon progressing?

Time series analysis:Change prediction, pattern mining

‘Bhav Puchiye’

www.broadmoor.com

Interface for Bhav Puchiye

Inverted Pyramid Interfaces

Inverted pyramid approach

Conclusion

Findings

Discussions

Conclusion

Discussions

Findings

Background & related Information

Background & related Information

Bhav Poochiye

Pricing Module developed

for selected commodities

for selected markets

for selected areas

DEMO

14

Building Usage Profiles

Estimate access probabilities based on:

• Current user/community navigational patterns over site contents

(in the form of click streams)

• Historical user/community access patterns over site contents

(in the form of association rules)

Cluster needs based onlocation, income/age of user, time-of-day

15

Data Mining

From datato information

to knowledge

to money!

top related