cs 744: marius

CS 744: MARIUS

Shivaram VenkataramanFall 2021

Hi !

ADMINISTRIVIA

- Midterm grades today!- Course Project: Check in by Nov 30th

→ Pick up paupers in my office hours

Yieis OH

↳ Monday- DONT PANIC !

- on the class on Tuesday- Atrip the discussion

- solution walk through

Scalable Storage Systems

Datacenter Architecture

Resource Management

Computational Engines

Machine Learning SQL Streaming Graph

ApplicationsV M

Power graph#Graph✗

EXAMPLE: LINK PREDICTION

Task: Predict potential connections in a social network

4

[0.25, 0.45, 0.30]

[0.15, 0.85, 0.92]

…Find K-nearest

neighbors

embedding

flirt of recommendations f) onepe. -vertex

fear )frying

?

captures embeddingsthe structure of trigraph

BACKGROUND: GRAPH EMBEDDING MODELS

Score functionCapture structure of the graph given source, destination embedding

Loss functionMaximize score for edges in graphMinimize for others (negative edges)

5

NEGATIVE SAMPLING

Sample from edges not in the graph!

Two options1. According to data distribution

2. Uniformly

→ for a particular edge-

↳ cosine similarity

7score for negative

edge

T-

\ regularization--

scorefor positive edge

TRAINING ALGORITHM

SGD/AdaGrad optimizer

Sample positive, negative edges

Access source, dest embeddings for each edge in batch

6

for i in range(num_batches)B = getBatchEdges(i)E = getEmbeddingParams(B)G = computeGrad(E, B)updateEmbeddingParams(G)

Rennet

in-

Inputs

take a batch of edgesallhe

weights are used ! I epoch = all positive edges

-

I

- - -

-

return the⇒

÷=¥¥÷ Y"

Nxd → TfN et te

update embedding barons end .

CHALLENGE: LARGE GRAPHS

Large graphs à Large model sizes

Example3 Billion vertices, d = 400 Model size = 3 billion * 400 * 4 = 4.8 TB!

Need to scale beyond GPU memory, CPU memory!

7

f- Billions of vertices D= too to 1000

-

☐µ=3

CHALLENGE: DATA MOVEMENT

(a) Sample edges, embeddings from CPU memory (DGL-KE)

(b) Partition embeddings so that one partition fits on GPU memory. Load sequentially (Pytorch-BigGraph)

8

One epoch on the Freebase86m knowledge graph

Data movement overheads à low GPU util

(12% GPU utilization

batch gradients

☒E::÷÷¥Cpu memory-

MARIUS

I/O efficient system for learning graph embeddings

Marius Design- Pipelined training- Partition ordering

9

PIPELINED TRAINING

10

splits the algorithminto 5 stages

and pipelines stagesto

utilize CPU&

Gpu&

Peted- same

/ queuesize controls time

-staleness !

¥ . I•

Edges , epma!Em)b2

0 - ¥I

stale data !- If biz

has shared embeddingspdatembeddingswith by then bis embeddings- sparse

access patterns⇒ not too many conflicts

are stale

OUT of MEMORY training

11

Key idea: Maintain a cache of partitions in CPU memory

QuestionsOrder of partition traversal? How to perform eviction?

adjacencymatrix

•

mygraph

¥É¥¥÷lefortisall

the squares

→Cache in

CPU memory

Oo 01 Oh-

→

Node embedding-

parameters

ELIMINTATION ORDERING

12

Initialize cache with c partitions

Swap in partition that leads to highest number of unseen pairs

Achieved by fixing c-1 partitions and swap remaining in any order

µ:!;d no conflicts processeddosa i

±XXX

✗

2

✗ 0.

'

3 .-

iÉclose to the tower bound

number swapsof

SUMMARY

Graph Embeddings: Learn embeddings from graph data for ML

Marius: Efficient single-machine trainingPipelining to use CPU, GPU Partition buffer, BETA ordering

→

-minimize number of swaps

DISCUSSIONhttps://forms.gle/LtoT8nEmw3oLvXuo9

If you were going to repeat the COST analysis for knowledge graph embedding training, what would you expect to find and why?

→ Distributed systems need 4*-8 apu, to match µiyle9P⇒→Mar ~ 3 Gpus of DGL

→ communication coat ifwe go

to many GPUs

u

gka training→ compute >>

morecompute

-

☒reason

→ I/o the bigger bottleneck? !

a-↳ date dependent

How does the partitioning scheme used in this paper differ from partitioning schemes used in PowerGraph and why?

Workload was focused on

→vertex and its neighbors → PageRank job

→ edge drives the computation

↳ source ,dot for this edge

NEXT STEPS

Next class: New module!Project check-ins by Nov 30th

cs 744: marius

Documents