david ediger, jason riedy, rob mccoll, david a. baderbader/papers/ppopp12/ppopp-2012-part1.pdf ·...

51
Parallel Programming for Graph Analysis David Ediger, Jason Riedy, Rob McColl, David A. Bader

Upload: hatuyen

Post on 16-Mar-2018

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Parallel Programming for Graph AnalysisDavid Ediger, Jason Riedy, Rob McColl, David A. Bader

Page 2: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Tutorial Schedule

• 08:30am – 08:45am: Welcome & Introduction• 08:45am – 10:00am: Par t 1:Network Analysis• 10:00am – 10:30am: Coffee Break• 10:30am – 12:00pm: Par t 2: Static Parallel Algorithms• 12:00pm – 01:30pm: Lunch• 01:30pm – 03:00pm: Par t 3: Dynamic Paral lel

Algorithms• 03:00pm – 03:30pm: Coffee Break• 03:30pm – 05:00pm: Par t 4: STINGER

Parallel Programming for Graph Analysis 2

Page 3: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Tutorial Outline

• Introduction– Speakers

• 1: Network Analysis– Introduction to Graph Theory– Opportunities & Challenges– Parallel, Multicore & Multithreaded

Architectures– Q&A

• 2: Static Parallel Algorithms– Graph Traversal / Graph500– Social Networking Algorithms– Graph Software Libraries– Q&A

• 3: Dynamic Parallel Algorithms– Data Structures for Streaming Data– Clustering Coefficients– Connected Components– Anomaly Detection– Q&A

• 4: GraphCT & STINGER

– Using GraphCT, app & lib.

– Using STINGER• Creation, Insertion, & Deletion

• Graph Traversal

• Connected Components

• Included Utilities

– Q&A

Parallel Programming for Graph Analysis 3

Page 4: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

E. Jason Riedy• Research Scientist II, Georgia Institute of Technology

– School of Computation Science & Engineering

• Ph.D. 2010, University of California, Berkeley– Still likes Fortran...

• Contact Info:– College of Computing– Georgia Institute of Technology– Atlanta, GA 30332– [email protected]– http://www.cc.gatech.edu/~jriedy

Parallel Programming for Graph Analysis 4

Page 5: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

David Ediger

• Ph.D. Student, Georgia Tech– advisor: Prof. David A. Bader

• B.S. 2008, The George Washington University• M.S. 2010, Georgia Tech

Parallel Programming for Graph Analysis 5

Currently working on development efforts of parallel graph algorithms and software: GraphCT and STINGER.

Page 6: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Outline

• Introduction• Network Analysis

– Introduction to Graph Applications– Opportunities & Challenges– Parallel, Multicore & Multithreaded Architectures

• Static Parallel Algorithms• Dynamic Parallel Algorithms• GraphCT & STINGER

Parallel Programming for Graph Analysis 6

Page 7: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

MOTIVATION

Parallel Programming for Graph Analysis 7

Page 8: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Exascale Analytics: Real-world challenges

• Health care disease spread, detection and prevention of epidemics/pandemics (e.g. SARS, Avian flu, H1N1 “swine” flu, …)

• Massive social networks energy conservation requires social change, modeling pandemic spread, transportation and evacuation

• Intell igence business analytics, anomaly detection, security, knowledge discovery from massive data sets

• Systems Biology understanding complex life systems, drug design, microbial research, unravel the mysteries of the HIV virus; understand life, disease, and evolution

• Electric Power Grid communication, transportation, energy, water, food supply

• Modeling and Simulation  Perform full-scale economic-social-political simulations

Requires dynamic Spatio-Temporal Interaction Networks and Graphs (STING)

Parallel Programming for Graph Analysis 8

Page 9: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Driving Forces in Social Network Analysis• Facebook has more than 850 million active users

• Note the graph is changing as well as growing.• What are this graph's properties? How do they change?• Traditional graph partitioning often fails:

– Topology: Interaction graph is low-diameter, and has no good separators– Irregularity : Communities are not uniform in size– Overlap: individuals are members of one or more communities

• Sample queries: – Allegiance switching : identify entities that switch communities.– Community structure: identify the genesis and dissipation of communities– Phase change: identify significant change in the network structure

Parallel Programming for Graph Analysis 9

Page 10: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Friendship Networks

• This image is a visualization of LiveJournal's friendship network. The network consists of 4.8 million people (vertices) connected by 69 million relationships (edges). Credit: Yifan Hu (AT&T)

Parallel Programming for Graph Analysis 10

Page 11: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Center for AdaptiveSupercomputing Software (CASS-MT)• CASS-MT, launched July 2008• Pacific-Northwest Lab

– Georgia Tech, Sandia, WA State, Delaware• The newest breed of supercomputers have hardware set up not just for

speed, but also to better tackle large networks of seemingly random data. And now, a multi-institutional group of researchers has been awarded over $14 million to develop software for these supercomputers. Applications include anywhere complex webs of information can be found: from internet security and power grid stability to complex biological networks.

David A. Bader 11

Page 12: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

CASS-MT TASK 7: Analysis of Massive Social Networks

David A. Bader 12

ObjectiveTo design software for the analysis of massive-scale spatio-temporal interaction networks using multithreaded architectures such as the Cray XMT. The Center launched in July 2008 and is led by Pacific-Northwest National Laboratory.

DescriptionWe are designing and implementing advanced, scalable algorithms for static and dynamic graph analysis, including generalized k-betweenness centrality and dynamic clustering coefficients.

HighlightsOn a 64-processor Cray XMT, k-betweenness centrality scales nearly linearly (58.4x) on a graph with 16M vertices and 134M edges. Initial streaming clustering coefficients handle around 200k updates/sec on a similarly sized graph.

Our research is focusing on temporal analysis, answering questions about changes in global properties (e.g. diameter) as well as local structures (communities, paths).

Image Courtesy of Cray, Inc.

David A. Bader (CASS-MT Task 7 LEAD)David Ediger, Jason Riedy, Henning Meyerhenke

Page 13: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Ubiquitous High Performance Computing (UHPC)

Goal: develop highly parallel, security enabled, power efficient processing systems, supporting ease of programming, with resilient execution through all failure modes and intrusion attacks

Program Objectives:One PFLOPS, single cabinet including self-contained cooling

50 GFLOPS/W (equivalent to 20 pJ/FLOP)

Total cabinet power budget 57KW, includes processing resources, storage and cooling

Security embedded at all system levels

Parallel, efficient execution models

Highly programmable parallel systems

Scalable systems – from terascale to petascale

Architectural Drivers:Energy EfficientSecurity and DependabilityProgrammability

Echelon: Extreme-scale Compute Hierarchies with Efficient Locality-Optimized Nodes

“NVIDIA-Led Team Receives $25 Million Contract From DARPA to Develop High-Performance GPU Computing Systems” -MarketWatch

David A. Bader (CSE)Echelon Leadership Team

13

Page 14: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Parallel Programming for Graph Analysis

And more...

•Spatio-Temporal Interaction Networks and Graphs Software for Intel platforms–Intel Project on Parallel Algorithms in Non-

Numeric Computing, multi-year project–Collaborators: Mattson (Intel), Gilbert (UCSD),

Blelloch (CMU), Lumsdaine (Indiana)

•Graph500: http://graph500.org– Shooting for reproducible benchmarks for

massive graph-structured data

14

Page 15: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Parallel Programming for Graph Analysis

Example: Mining Twitter for Social Good

15

ICPP 2010

Image credit: bioethicsinstitute.org

Page 16: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

• CDC / Nation-scale surveillance of public health

• Cancer genomics and drug design– computed Betweenness Centrality

of Human Proteome

Massive Data Analytics: Protecting our NationUS High Voltage Transmission Grid (>150,000 miles of l ine)

Public Health

Parallel Programming for Graph Analysis 16

ENSG00000145332.Kelch-

like proteinimplicat

ed in breast cancer

Page 17: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Routing in transportation networks

Road networks, Point-to-point shor test paths: 15 seconds (naïve) 10 microseconds

H. Bast et al., “Fast Routing in Road Networks with Transit Nodes”, Science 27, 2007.

17Parallel Programming for Graph Analysis

Page 18: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Internet and the WWW

• The world-wide web can be represented as a directed graph– Web search and crawl: traversal– Link analysis, ranking: Page rank and HITS– Document classification and clustering

• Internet topologies (router networks) are naturally modeled as graphs

18Parallel Programming for Graph Analysis

Page 19: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Scientific Computing• Reorderings for sparse solvers

– Fill reducing orderings• partitioning, traversals, eigenvectors

– Heavy diagonal to reduce pivoting (matching)

• Data structures for efficient exploitation

of sparsity• Derivative computations for optimization

– Matroids, graph colorings, spanning trees

• Preconditioning– Incomplete Factorizations– Partitioning for domain decomposition– Graph techniques in algebraic multigrid

• Independent sets, matchings, etc.– Support Theory

• Spanning trees & graph embedding techniquesB. Hendrickson, “Graphs and HPC: Lessons for Future Architectures”, http://www.er.doe.gov/ascr/ascac/Meetings/Oct08/Hendrickson%20ASCAC.pdf

19Parallel Programming for Graph Analysis

Page 20: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Graphs are pervasive in large-scale data analysis

• Sources of massive data: petascale simulations, experimental devices, the Internet, scientific applications.

• New challenges for analysis: data sizes, heterogeneity, uncertainty, data quality.

Astrophysics Problem: Outlier detection. Challenges: massive datasets, temporal variations.Graph problems: clustering, matching.

BioinformaticsProblem: Identifying drug target proteins.Challenges: Data heterogeneity, quality.Graph problems: centrality, clustering.

Social InformaticsProblem: Discover emergent communities, model spread of information.Challenges: new analytics routines, uncertainty in data.Graph problems: clustering, shortest paths, flows.

Image sources: (1) http://physics.nmt.edu/images/astro/hst_starfield.jpg (2,3) www.visualComplexity.com

20Parallel Programming for Graph Analysis

Page 21: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Data Analysis and Graph Algorithms in Systems Biology

• Study of the interactions between various components in a biological system

• Graph-theoretic formulations are pervasive:– Predicting new interactions:

modeling– Functional annotation of novel

proteins: matching, clustering– Identifying metabolic pathways:

paths, clustering– Identifying new protein

complexes: clustering, centralityImage Source: Giot et al., “A Protein Interaction Map of Drosophila melanogaster”, Science 302, 1722-1736, 2003.

21Parallel Programming for Graph Analysis

Page 22: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Image Source: Nexus (Facebook application)

Graph –theoretic problems in social networks

– Community identification: clustering

– Targeted advertising: centrality– Information spreading: modeling

22Parallel Programming for Graph Analysis

Page 23: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Network Analysis for Intelligence and Survelliance

• [Krebs ’04] Post 9/11 Terrorist Network Analysis from public domain information

• Plot masterminds correctly identified from interaction patterns: centrality

• A global view of entities is often more insightful

• Detect anomalous activities by exact/approximate graph matching / motifs

Image Source: http://www.orgnet.com/hijackers.html

Image Source: T. Coffman, S. Greenblatt, S. Marcus, Graph-based technologies for intelligence analysis, CACM, 47 (3, March 2004): pp 45-47

23Parallel Programming for Graph Analysis

Page 24: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Characterizing Graph-theoretic computations

• graph sparsity (m/n ratio)• static/dynamic nature• weighted/unweighted, weight distribution• vertex degree distribution• directed/undirected• simple/multi/hyper graph• problem size• granularity of computation at nodes/edges• domain-specific characteristics

• graph sparsity (m/n ratio)• static/dynamic nature• weighted/unweighted, weight distribution• vertex degree distribution• directed/undirected• simple/multi/hyper graph• problem size• granularity of computation at nodes/edges• domain-specific characteristics

• paths• clusters• partitions• matchings• patterns• orderings

• paths• clusters• partitions• matchings• patterns• orderings

Input: Graph abstraction

Problem: Find ***

Factors that influence choice of algorithmGraph

algorithms

• traversal• shortest path algorithms• flow algorithms• spanning tree algorithms• topological sort …..

• traversal• shortest path algorithms• flow algorithms• spanning tree algorithms• topological sort …..

Graph problems are often recast as sparse linear algebra (e.g., partitioning) or linear programming (e.g., matching) computations

Graph problems are often recast as sparse linear algebra (e.g., partitioning) or linear programming (e.g., matching) computations

24Parallel Programming for Graph Analysis

Page 25: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Massive data analytics in informatics networks

• Graphs arising in Informatics are very different from topologies in scientific computing.

• We need new data representations and parallel algorithms that exploit topological characteristics of informatics networks.

Emerging applications: dynamic, high-dimensional data

Static networks, Euclidean topologies

25Parallel Programming for Graph Analysis

Page 26: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

What we’d like to infer from information networks

• What are the degree distributions, clustering coefficients, diameters, etc.?

– Heavy-tailed, small-world, expander, geometry+rewiring, local-global decompositions, ...

• How do networks grow, evolve, respond to perturbations, etc.?– Preferential attachment, copying, HOT, shrinking diameters, ..

• Are there natural clusters, communities, partitions, etc.?– Concept-based clusters, link-based clusters, density-based clusters, ...

• How do dynamic processes – search, diffusion, etc. – behave on networks?

– Decentralized search, undirected diffusion, cascading epidemics, ...

• How best to do learning, e.g., classification, regression, ranking, etc.?– Information retrieval, machine learning, ...

Slide credit: Michael Mahoney, Stanford

26Parallel Programming for Graph Analysis

Page 27: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

CHALLENGES

Parallel Programming for Graph Analysis 27

Page 28: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Hierarchy of Interesting Graph Analytics

Extend single-shot graph queries to include time.Are there s-t paths between time T1 and T

2?

What are the important vertices at time T?

Use persistent queries to monitor proper ties.Does the path between s and t shorten drastically?

Is some vertex suddenly very central?

Extend persistent queries to fully dynamic proper ties.

Does a small community stay independent rather than merge with larger groups?

When does a vertex jump between communities?

New types of queries, new challenges...

28Parallel Programming for Graph Analysis

Page 29: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Graph Analytics for Social Networks

• Are there new graph techniques? Do they parallelize? Can the computational systems (algorithms, machines) handle massive networks with millions to billions of individuals? Can the techniques tolerate noisy data, massive data, streaming data, etc. …

• Communities may overlap, exhibit dif ferent proper ties and sizes, and be driven by dif ferent models

– Detect communities (static or emerging)– Identify impor tant individuals– Detect anomalous behavior– Given a community, f ind a representative

member of the community– Given a set of individuals, f ind the best

community that includes them

Parallel Programming for Graph Analysis 29

Page 30: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Open Questions for Massive Analytic Applications

• How do we diagnose the health of streaming systems?• Are there new analytics for massive spatio-temporal interaction

networks and graphs (STING)?• Do current methods scale up from thousands to millions and

billions?• How do I model massive, streaming data streams?• Are algorithms resil ient to noisy data? • How do I visualize a STING with O(1M) entities? O(1B)?

O(100B)? with scale-free power law distribution of vertex degrees and diameter =6 …

• Can accelerators aid in processing streaming graph data?• How do we leverage the benefits of multiple architectures (e.g.

map-reduce clouds, and massively multithreaded architectures) in a single platform?

Parallel Programming for Graph Analysis 30

Page 31: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Computational research challenges

• How do we solve massive graph problems (statistical analysis, dynamics, community detection, learning) with current algorithms/ systems/languages/programming models/libraries?

• How do we get analytics routines to scale for massive datasets?– We need new algorithms that exploit graph topology & modern computer

architectures

• What are the right abstractions for designing portable, high-performance graph algorithms?

– PRAM-like, BSP-like, MapReduce, matrices/tensors?

• Application Architecture mapping for current systems?– Multicore servers, commodity clusters, supercomputers, clouds

• What can we do with accelerators?– GPUs, Cell processor, Netezza and Azul data warehousing appliances

• What are the languages & programming models we should be using for high productivity/performance?

– C/C++ & threading, PGAS languages, MPI, Pig/Hive/Sawzall, …

31Parallel Programming for Graph Analysis

Page 32: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

ARCHITECTURE

Parallel Programming for Graph Analysis 32

Page 33: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Flywheel has driven HPC into a corner

Parallel Programming for Graph Analysis 33

For decades, HPC has been on a vicious cycle of enabling applications that run well on HPC systems.

Real-World Applications(with natural parallelism)

Graphs have natural concurrency; yet HPC architectures are designed to achieve high floating-point rates by exploiting spatial and temporal localities.

• For the first time in decades, manycore and multithreaded can let us rethink architecture.

These are today’s MPI applications!

Page 34: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

34

Looking Ahead: Grand Challenge in Graph Analytics

• Driving real-world applications are not just traditional HPC:– health care, transportation, energy, proteomics, security, data

sciences, informatics, …

• FLOPS are “free”, data is the challenge!• Fundamental abstractions are more irregular and complex

than sparse/dense matrices– Very sparse, very high-dimensional data– For example, there are few (if any) ef f icient distributed-

memory parallel implementations of even the simplest algorithm for sparse, arbitrary graphs!

• We must integrate across algorithms, programming models, and architectures, to address the growing requirements of grand challenge applications

Parallel Programming for

Graph Analysis

Page 35: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

• Ideally, we want to do 1 (or more) memory op/clock cycle/processor.

• However, we have to deal with– Caches

• Reduce latency by storing data close to the processor

– Vectors• Amortize latency by fetching N words at a time

– Parallelism• Hide latency by switching tasks• Little’s law: concurrency = bandwidth * latency

Hiding memory latency

35Parallel Programming for Graph Analysis

Page 36: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

• Tolerates latency by extreme multithreading– Each processor supports 128 hardware threads– Context switch in a single clock tick– “No” cache or local memory– Context switch on memory request– Multiple outstanding loads (configurable, often 180)

• Remote memory requests do not stall processors– Other streams work while the request

gets fulfilled

• Light-weight, word-level synchronization– Minimizes access conflicts

• Hashed global shared memory– 64-byte granularity, minimizes hotspots

• High-productivity graph analysis!

Cray XMT Operation

36Parallel Programming for Graph Analysis

Image: Swiss NationalSupercomputer Centre

Page 37: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

XMT ThreadStorm Processor (logical view)

i = n

i = 3

i = 2

i = 1

. . .

1 2 3 4

Sub- problem

A

i = n

i = 1

i = 0

. . .

Sub- problem

BSubproblem A

Serial Code

Unused streams

. . . .

Programs running in parallel

Concurrent threads of computation

Hardware streams (128)

Instruction Ready Pool;

Pipeline of executing instructions

Slide Credit: Cray, Inc.

37Parallel Programming for Graph Analysis

Page 38: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

XMT ThreadStorm System (logical view)

i = n

i = 3

i = 2

i = 1

. . .

1 2 3 4

Sub- problem

A

i = n

i = 1

i = 0

. . .

Sub- problem

BSubproblem A

Serial Code

Programs running in parallel

Concurrent threads of computation

Multithreaded across multiple processors

. . . . . . . . . . . .

Slide Credit: Cray, Inc.

38Parallel Programming for Graph Analysis

Scale to larger problems by adding more nodes.Thousands possible... 0.5 TiB per 16 processors (XMT2)

Page 39: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

What is less important on XMT• Placing data near computation• Modifying shared data• Accessing data in order• Using indirection or linked data-structures• Partitioning program into independent,

balanced computations• Using adaptive or dynamic computations• Minimizing synchronization operations

Many costs amortized with memory access!39Parallel Programming for Graph Analysis

Page 40: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

• Architecture supports up to 2 TiB, commercially available.

• Tolerates some latency by hyperthreading.

– Hardware: 2 threads / core, 10 cores / socket, four sockets.

– Fast cores (2.4 GHz), fast memory (1 066 MHz).

– Not so many outstanding memory requests (60/socket), but large caches (30 MiB L3 per socket).

• Good system support

– Transparent hugepages reduces TLB costs.

– Fast, user-level locking. (HLE may be better...)

– Widely available tuned primitives• Also high-productivity graph analysis!

Intel E7-8870-based platforms

40Parallel Programming for Graph Analysis

Image: Intel(tm) press kit

Page 41: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Problem Size (Log2 # of vertices)

10 12 14 16 18 20 22 24 26

Pe

rfor

ma

nce

rate

(Mill

ion

Tra

vers

ed E

dge

s/s)

0

10

20

30

40

50

60

Serial Performance of “approximate betweenness centrality” on a 2.67 GHz Intel Xeon 5560 (12 GB RAM, 8MB L3 cache)

Input: Synthetic R-MAT graphs (# of edges m = 8n)

The problems: #1. The locality challenge“Large memory footprint, low spatial and temporal locality impede performance”

~ 5X drop in performance

No LLC misses

O(m) LLC misses

41Parallel Programming for Graph Analysis

LLC: Last-Level Cache

Page 42: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

• Graph topology assumptions in classical algorithms do not match real-world datasets

• Parallelization strategies at loggerheads with techniques for enhancing memory locality

• Classical “work-efficient” graph algorithms may not fully exploit new architectural features– Increasing complexity of memory hierarchy (x86), DMA support (Cell),

wide SIMD, floating point-centric cores (GPUs).

• Tuning implementation to minimize parallel overhead is non-trivial– Shared memory: minimizing overhead of locks, barriers.– Distributed memory: bounding message buffer sizes, bundling

messages, overlapping communication w/ computation.

The problems: #2. The parallel scaling challenge “Classical parallel graph algorithms perform poorly on current parallel systems”

42Parallel Programming for Graph Analysis

Page 43: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Parallel Programming for Graph Analysis 43

The Big Picture

Analyst makesqueries.

MassiveDatabases

FastGraphQuery

Extract “Window”Extract “Window”

High Latency Query

Graph resides in memory of supercomputer.

43

Page 44: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Graph Extraction

• Filtering massive (petabytes to exabytes) raw data- HTML documents a link graph- E-mail archives a collaboration network- Research papers a citation network- A large network a subnetwork

<li><a href="http://www.ieee.org/fellow/">IEEE Fellow</a>

<li> <a href="http://www.nsf.gov/">National Science Foundation</a> CAREER Award. <li>Georgia Tech College of Computing 2007 Dean's Award.<li>2008 IBM Shared University Research Award <li>2006 IBM Faculty Fellowship Award<li>2008 NVIDIA Professor Partnership Award <li>2006-7 Microsoft Research award <li>UNM Regents' Lecturer. <li><a href="http://www.computer.org/">IEEE Computer Society</a>

<a href="http://www.computer.org/chapter/DVP/">Distinguished Speaker</a>

44Parallel Programming for Graph Analysis

Page 45: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Graph Analysis

• Irregularly traversing large (multi-billion vertices with up to a trillion edges) graph data.

45Parallel Programming for Graph Analysis

Page 46: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Parallel Programming for Graph Analysis 46

Informatics Graphs are Even Tougher

• Very dif ferent from graphs in scientif ic computing!– Graphs can be enormous– Power-law distribution of the number of neighbors– Small world property – no long paths– Very limited locality, not partitionable– Highly unstructured– Edges and vertices have types

• Experience in scientific computing applications provides only limited insight.

Six degrees of Kevin BaconSource: Seokhee Hong

46

Page 47: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

David A. Bader 47

Architectural Requirements for Efficient Graph Analysis (Challenges)

• Runtime is dominated by latency– Random accesses to global address space– Perhaps many at once

• Essentially no computation to hide memory costs

• Access pattern is data dependent– Prefetching unlikely to help– Usually only want small part of cache line

• Potentially abysmal locality at all levels of memory hierarchy

47

Page 48: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

David A. Bader 48

Architectural Requirementsfor Efficient Graph Analysis (Desired Features)

• A large memory capacity for random access- For example, 1 trillion edges can fit into the 8 TB

(instead of 8 PB) main memory• Low latency / high bandwidth

– For small messages!• Latency tolerant• Light-weight synchronization mechanisms• Global address space

– No graph partitioning required– Avoid memory-consuming profusion of ghost-nodes– No local/global numbering conversions

48

Page 49: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Parallel Programming for Graph Analysis

Breadth-first search comparison: Graph500• Graph500 benchmark: BFS on an artificial graph• 2^Size vertices, 16 * 2^Size edges• TEPS: Traversed edges per second

49

Rank Machine Owner Size TEPS

1 NNSA/SC Blue Gene/Q Prototype II (4096 nodes / 65,536 cores)

NNSA and IBM Research, T.J. Watson

32 254.3 B

2 Hopper XE6 (1800 nodes / 43,200 cores)

LBNL 37 113.3 B

3 Lomonosov (4096 nodes / 32,768 cores)

Moscow State University

37 103.2 B

17 mirasol (Quad-Socket Intel E7-8870 - 2.4 GHz - 10 cores / 20 threads per socket)

Intel/Georgia Tech(UCB implementation)

28 5.1 B

20 Dingus (Convey HC1ex, 4FPGA)

SNL 28 1.7 B

35 Matterhorn (64-proc XMT2) CSCS 31 884.6 M

Page 51: David Ediger, Jason Riedy, Rob McColl, David A. Baderbader/papers/PPoPP12/PPoPP-2012-part1.pdf · • Massive social networks energy conservation requires social change, ... from

Q&A

Parallel Programming for Graph Analysis 51