techniques for visualizing massive data sets leilani battle, mike stonebraker

Post on 29-Mar-2015

235 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

TECHNIQUES FOR VISUALIZING MASSIVE DATA SETSLeilani Battle, Mike Stonebraker

Context

Visualization System

Database

query

result

Problem

• Performance• Vis systems don’t scale well for big data• Or are turning into databases

• Over-plotting• Makes visualizations unreadable• Waste of time/resources

Solution: Resolution Reduction

Visualization System

Database

Resolution Reduction Layer

query

queryplan query

queryplan result

modified query

reduced result

ScalaR

• Scalable vis system for data exploration• Web front-end• Uses SciDB (www.scidb.org)

• Visualizes query results• Performs Resolution Reduction

Demo of ScalaR

Array Browser

• Collaboration with:• Brown: Justin DeBrabant, Stan Zdonik, Ugur Cetintemel• Stanford: Zhicheng Liu, Jeff Heer

• Google Maps-style exploration experience• Fetches subsets of the data (aka data tiles)

Array Browser Example

Array Browser Architecture

Demo of Array Browser

Future Work: Prefetching

• Goal: Reduce user-wait time by prefetching tiles• Cache tiles in the tile buffer• Need algorithms to decide what to pre-fetch

User Behavior Predictor (Seer)

P

P

• Learn common query sequences from user traces

Statistical Analysis Predictor

P

P

P

• Look for statistical similarities in tiles• Try to guess what’s important based on patterns

Using Multiple Predictors

• Run multiple predictors (or experts) in parallel• Compare predictions to user’s actual behavior• Use predictions from best performing expert

• May change over time based on user’s goals

Other Challenges

• Lots if interesting problems left to address• Best eviction policy for the tile buffer?• How to share data between multiple users?• More predictors?

Questions?

Gemini Sagittarius

Dogs Cats

Prefetching Experts

• User behavior predictor (Seer)• Learn common query sequences from user traces

• Stats analysis predictor• Look for statistical similarities in tiles• Try to guess what’s important based on patterns

top related