massive streaming data analytics: a case study with clustering coefficients david ediger karl jiang...

Post on 22-Dec-2015

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Massive Streaming Data Analytics:A Case Study with Clustering Coefficients

David EdigerKarl Jiang

Jason RiedyDavid A. Bader

Georgia Institute of TechnologyAtlanta, GA USA

1

STINGER Data Structure

• Spatio-temporal Interaction Networks and Graphs (STING) Extensible Representation

• General-purpose data structure for dynamic graphs

• Efficient edge insertion/deletion (updates) with concurrent readers (analysis)

2

STINGER Data Structure

• Array of linked lists, which may have empty slots (from deleting edges)

• Additional storedinfo not in paper

• Efficient updates• Concurrent reads

(no locking)

3

Assumptions for parallelism

• Single streaming source for inserts/deletes• Changes are scattered widely– Batches are sufficiently independent

• Analysis kernels have small range– Graph change only requires access to local

portions and affects small portion of output

4

Assumptions (continued)

5

Case Study:Updating Clustering Coefficients

• Clustering coefficients measure density of closed triangles:

• One way of determining if a graph is a small-world graph

6

Bloom filter

• Consider an edge list represented as a bit array (1 bit per edge) => O(n) storage space

• Bloom filter is a bit array with an arbitrary, smaller number of bits

• A hash function maps a vertex to a specific bit• Small number of bits == high collision rate• To reduce false-positives, use k independent

hash functions to set multiple bits

7

Bloom filter

8

Testbed

• Massively multi-threaded Cray XMT– 64 Threadstorm processors• Each running at 500MHz• Each has 128 hardware streams maintaining a thread

context• Context switches occur every cycle• 512 GiB globally addressable shared memory

– (holds 2 billion vertices and 17 billion edges)

• Synthetic data– 16 million vertices, ~500 million edges

9

top related