database and data-intensive systems. data-intensive systems from monolithic architectures to diverse...

13
Database and Data-Intensive Systems

Upload: stephen-kennedy

Post on 27-Dec-2015

227 views

Category:

Documents


2 download

TRANSCRIPT

Database andData-Intensive Systems

Data-Intensive Systems

• From monolithic architectures to diverse systems Dedicated/specialized systems, column stores Data centers, web architectures, distributed architectures

• From business data to all data Streaming and sensor data, semi-structured and unstructured data Multidimensional data, temporal data, spatio-temporal data

• Examples Clustering of high-dimensional data Tracking and continuous queries for moving objects Mobile service infrastructure Location privacy Spatio-textural search/hyper-local web search Multimedia similarity search

• This is where much of our research “lives.”

Staff• Ira Assent, associate professor• Christian S. Jensen, professor• Vaida Ceikute, Ph.D. student• Xiaohui Li, visiting Ph.D. student

• NN, Ph.D. student GEOCROWD – indoor positioning and services infrastructure

• NN, Ph.D. student GEOCROWD – spatial web objects

• NN, Ph.D. student eData – Anomaly Detection in e-Science

• NN, Ph.D. student Streamspin

• NN, Ph.D. student WallViz

• NN, Ph.D. student REDUCTION

• NN, Ph.D. student REDUCTION

Graduate Course Portfolio: dDO

• Data management for moving objects (Q3)

• The course covers selected research advances in the general area of indexing and update and query processing for moving objects.

Moving object tracking Specific indexing techniques

R-tree based indexing B-tree based indexing

Techniques for the efficient handling of frequent updates Techniques for range and k nearest neighbor query processing,

including one-time as well as continuous queries

Graduate Course Portfolio: MDDB

• Multidimensional databases (Q4) Selected techniques for the management of multidimensionally

represented data Multidimensional data and applications

Data warehouses and data mining Similarity search and query processing

Efficient handling: indexing and associated query processing Multistep similarity search Indexing multidimensional data Skyline query processing

Data mining techniques Subspace clustering Classification Outlier detection

Graduate Course Portfolio: Index

• Indexing of disk-based data (Q1) Indexing techniques for disk-based data for different types of data,

as well as their support for queries and updates General overview over indexes and query processing Spatial indexing structures Space partitioning indexing structures Indexes for high dimensional data Metric approaches Special techniques for complex data types

Coming up for the first time this fall

Graduate Course Portfolio: dDB2

• Database management systems (Q2)

• The course aims to give the participants a solid conceptual foundation for making competent use of a database management system.

Logical and physical query optimization and query processing Concurrency control techniques Database tuning Central concepts and techniques in relation to supporting temporal

and multi-dimensional data

Coming up for the first time this fall

8

Projects

• Streamspin Enable sites that are for mobile services what YouTube is for video

Easy mobile service creation and sharing Advanced spatial and social context functionality Be an open, extensible, and scalable service delivery infrastructure

• MOVE Knowledge extraction from massive data about moving objects

Cross-cutting activities, showcases, and evaluation Representation of movement data and spatio-temporal databases Analysis of movement and spatio-temporal data mining

• WallViz Collaborative analysis, joint decision making on wall-sized displays

scale to massive data collections support ad-hoc queries automatically provide entry points for analysis

Projects (2)

• GEOCROWD Creating a Geospatial Knowledge World:

advance the state-of-the-art in collecting, storing, analyzing, processing, reconciling, and publishing user-generated geospatial information on the Web

• REDUCTION Reducing the environmental footprint of fleets of vehicles

Optimizing the behavior of drivers Supporting eco-routing of vehicles Enabling transparency in multi-modal transportation

• eData Robust analysis in the context of imperfect data in e-Science

Detect and correct anomalies effectively on-line, interactive, lineage-preserving, and semi-automatic Scalable algorithms

eData

How We Typically Work

• We target some real problem that we find interesting.• We define the problem precisely.• We develop a solution that is typically a data structure or

an algorithm, i.e., a concrete technique.• To evaluate, we build prototypes.

These are built for the purpose of studying the properties of our solutions.

We are often interested in performance, e.g., runtime, space usage, communication cost.

• For some solutions we state formal properties that we then prove, e.g., the correctness of a particular technique

• Brief: isolate and define problem, construct, then evaluate

Example 1: Spatial Web Querying

• Setting Google: ~90 billion queries/month, ~20 billion with local intent. We want to integrate exact locations of websites (for shops, bars,

etc.) and users into web querying.

• Queries Results must match the query text and must be near the user. Results of continuous queries must be updated as the user moves.

• Challenges? Support such queries with low computation cost on the server and with little communication between server and client.

• Solution Invent an index that supports both text and location Use a safe zone to reduce the communication between user and

server for continuous queries

Example 2: Fraud detection

• There are billions of financial transactions per minute • How do we uncover fraud?

Scalability In-time for reaction Manageable results

• Possible solution sketch Identify attributes of suspicious transactions Sort incoming transactions into a tree-structure of historic data When processing time is up, output degree of suspicion based on

similarity to valid or fraudulent historic data

Interested?

• Come talk to us!

• We currently have M.Sc. and PhD. thesis openings