m-flash · lin, zhiyuan, et al. "mmap: fast billion-scale graph computation on a pc via memory...

M-Flash: Fast Billion-Scale Graph Computation Using a Bimodal Block Processing Model

Hugo Gualdron University of Sao Paulo

Robson Cordeiro University of Sao Paulo

Jose Rodrigues-Jr University of Sao Paulo

Duen Horng (Polo) Chau Georgia Tech

Minsuk Kahng Georgia Tech

U Kang Seoul National University

Dezhi “Andy” FangPresenter

Georgia Tech

Internet4+ Billion Web Pages

www.worldwidewebsize.com www.opte.org

Citation Network

www.scirus.com/press/html/feb_2006.html#2 Modified from well-formed.eigenfactor.org

250+ Million Articles

Many More

§ TwitterWho-follows-whom (310 million monthly active users)

Who-buys-what (300+ million users)

§ cellphone networkWho-calls-whom (130+ million users)

Protein-protein interactions200 million possible interactions in human genome

Sources: www.selectscience.net www.phonedog.com www.mediabistro.com www.practicalecommerce.com

Large Graphs Are CommonGraph Nodes Edges

YahooWeb 1.4 Billion 6 Billion

Symantec Machine-File Graph 1 Billion 37 Billion

Twitter 104 Million 3.7 Billion

Phone call network 30 Million 260 Million

Takes Most Space

Scalable Graph Computation on Single Machines

0 500 1000 1500

TurboGraph

GraphChi

GraphX

Giraph

PageRank Runtime (s) on Twitter Graph(1.5 billion edges; 10 iterations, lower is better)

128Cores

SingleMachine(4 cores)

Today’s single machines are very powerful.

Can we do even better?McSherry, Frank, Michael Isard, and Derek G. Murray. "Scalability! But at what COST?." 15th Workshop on Hot Topics in Operating Systems (HotOS XV). 2015.Lin, Zhiyuan, et al. "Mmap: Fast billion-scale graph computation on a pc via memory mapping." Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 2014.

Our Observation #1: I/O is BottleneckGraph edges need to be stored on disk.

Symantec graph: 37 billion edges, 200+ GB

Disk access is much slower than RAM.

Goal: Reduce I/O, especially random accesses

Our Observation #2:Real-world graphs are sparse.Adjacency matrix contains dense and sparse blocks

Dense BlocksSparse Blockshttps://web.stanford.edu/class/bios221/labs/networks/lab_7_networks.html

M-Flash’s Solutions

1. Determine edge block types (dense and sparse)

2. Design efficient processing approaches for each block type

Determine Block Types In Pre-processing

BlockType =Sparse, if I/O cost if treated as Sparse

I/O cost if treated as Dense < 1

Dense, otherwise

DenseSparse

SparseSparse

Dense Block Processing(Assuming all blocks are dense)

New vertex values

Old vertex values

I/O Cost for Dense Block Processing

Typeequationhere.

O(𝛽 + 1 𝑉 + 𝐸

𝐵 + 𝛽9)

Each vertex is read 𝛽 times and then written once # Edge

Size of per I/O Operation#Interval (= #Row = #Column)

# Vertex

SourcePartition 1

Destination

Source Partition:Sequential Read

Sparse Block Processing(Assuming all blocks are sparse)

SourcePartition 2

Source

DestinationPartition 1

Destination Partition:Sequential Write

DestinationPartition 2

Sparse Block Processing(Assuming all blocks are sparse)

I/O Cost for Sparse Block Processing

Typeequationhere.

O(2 𝑉 + 𝐸 + 2|𝐸=>?=@A=A|

𝐵 + 𝛽9)

# Edge

Size of per I/O Operation#Interval (= #Row = #Column)

# Vertex Edge with extended information

Bimodal Block Processing

BlockType =Sparse, if I/O cost if treated as Sparse

I/O cost if treated as Dense < 1

Dense, otherwise

DenseSparse

SparseSparse

Large Graphs Used in Evaluation

Graph Nodes EdgesLiveJournal 5 Million 69 MillionTwitter 41 Million 1.5 BillionYahooWeb 1.4 Billion 6.6 BillionR-Mat (Synthetic) 4 Billion 12 Billion

Runtime of M-Flash

0 1000 2000 3000

MemorySize

PageRank Runtime (s) on 6 billion edge YahooWeb Graph

(1 iteration, shorter is better)

M-FlashMMapTurboGraphX-StreamGraphChi

• Fastest single-node graph computing framework• Innovative bimodal design that addresses varying edge

density in real-world graphs

• M-Flash Code: https://github.com/M-Flash/m-flash-cpp• MMap Project: http://poloclub.gatech.edu/mmap/

CNPq (grant 444985/2014-0), Fapesp (grants 2016/02557-0, 2014/21483-2), Capes, NSF (grants IIS-1563816, TWC-1526254, IIS-1217559) GRFP (grant DGE-1148903), Korean (MSIP) agency IITP (grant R0190-15-2012)

Dezhi “Andy” FangGeorgia Tech CS Undergradhttp://andyfang.me

m-flash · lin, zhiyuan, et al. "mmap: fast billion-scale graph computation on a pc via memory...

Documents

arc flash calculations for consulting engineers flash...arc...

ieee/nfpa arc flash phenomena collaborative research...

the cai & mmap are coming to fhda! - foothill college ›...

multiple measures assessment project (mmap)

hpe media management and analytics platform (mmap) hpe

adapting ieee 1584-2002arc flash study results to a post...

ieee std 1584 2002-arc-flash hazard calculations

novel approach to arc flash mitigation for low voltage...

arc flash mitigation – an overview - western mining...

multiple measures assessment project (mmap) cañada pilot...

mmap 4 teaching and learning @ cadee upm

update on ieee/nfpa collaboration on arc flash...

multiple measures assessment project (mmap): pilot...

arc-flash hazard analysis€¢ arc current equations...

arc flash worksheets - brainfiller...the course was based on...

arc flash analysis: ieee method versus the nfpa 70e tables

arc flash worksheetsthe course was based on the 2018 edition...

qcon london - persistent memory · 2016-04-05 · space...

mi health link / mmap

myvr mmap sdk 2.1