scalable wikipedia with erlang - onscale

39
Scalable Wikipedia with Erlang Thorsten Schütt 1 Thorsten Schütt, Florian Schintke , Alexander Reinefeld Zuse Institute Berlin (ZIB) onScale solutions

Upload: others

Post on 04-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scalable Wikipedia with Erlang - onScale

Scalable Wikipedia with Erlang

Thorsten Schütt 1

Thorsten Schütt, Florian Schintke , Alexander Reinefeld

Zuse Institute Berlin (ZIB)

onScale solutions

Page 2: Scalable Wikipedia with Erlang - onScale

Scaling

Thorsten Schütt 2

Web 2.0 Hosting

Page 3: Scalable Wikipedia with Erlang - onScale

1. Step

• Single Server

– Webserver

– DB Server

Clients

Thorsten Schütt 3

– DB Server

Page 4: Scalable Wikipedia with Erlang - onScale

2. Step

• Single Webserver

Clients

Thorsten Schütt 4

• Single DB server

Page 5: Scalable Wikipedia with Erlang - onScale

3. Step

• Load Balancer

• n Webservers

Clients

. . .

Thorsten Schütt 5

• Single DB server

Page 6: Scalable Wikipedia with Erlang - onScale

. . .

4. Step

• Load Balancer

• n Webservers

Clients

Thorsten Schütt 6

• Master DB server

• Slave DB servers

Page 7: Scalable Wikipedia with Erlang - onScale

. . .

5. Step

• Load Balancer

• n Webservers

Clients

Thorsten Schütt 7

• DB partitioning

• DB clusters

• n DB servers

• memcached

DB Directory

. . .

Page 8: Scalable Wikipedia with Erlang - onScale

What did others do?

• SimpleDB

• BigTable

• Dynamo

• MySQL Cluster

Thorsten Schütt 8

• MySQL Cluster

• MapReduce

• Google FS

Page 9: Scalable Wikipedia with Erlang - onScale

Our Approach: P2P in the Data Center

Clients

Key/Value Store

(simple DBMS)

Thorsten Schütt 9

P2P OverlayP2P Overlay

Transactions Transactions

ReplicationReplication

ACID

Page 10: Scalable Wikipedia with Erlang - onScale

Scalable Wikipedia with Erlang

• Chord#

• Transactional Layer

• Load Balancing

Thorsten Schütt 10

• Wikipedia

Page 11: Scalable Wikipedia with Erlang - onScale

Chord# = Chord \ Hash

Thorsten Schütt 11

Chord# = Chord \ Hash

Page 12: Scalable Wikipedia with Erlang - onScale

Chord#

• A dictionary has 3 ops:

• insert(key, value)

• delete(key)

• lookup(key)

• Chord# implements a distributed dictionary

Thorsten Schütt 12

• Chord implements a distributed dictionary

Key Value

Clarke 2007

Allen 2006

Bachman 1973

Thompson 1983

Knuth 1974

Codd 1981

node A

node D

node B

node C

Each node only

stores part of the

data

Page 13: Scalable Wikipedia with Erlang - onScale

Backus

Codd

Dijkstra

FlyodRivest

Wirth

Yao

Chord#

• Key Space is an arbitrary set

with a total order, e.g.

strings

• Every node is assigned a

Thorsten Schütt 13

Gray

Karp

Knuth

Milner

Minsky

Naur

Nygaard

Perlis

Ritchie

• Every node is assigned a

random key

• Nodes form logical ring

over the key space

Page 14: Scalable Wikipedia with Erlang - onScale

Backus

Distributed Dictionary

• Items are stored on their

successor, i.e. the first node

encountered in clockwise

direction

Thorsten Schütt 14

Backus

Codd

Ritchie

Rivest

Wirth

Yao

(Thompson, 1983)

(Clarke, 2007)

(Allen, 2006)

direction

– Thomson on Wirth

– Allen on Backus

– Bachman on Backus

– Clarke on Codd

– …

(Bachman, 1973)

Page 15: Scalable Wikipedia with Erlang - onScale

• Each node puts the successor and log2N exponentially spaced fingers in its

routing table

With each routing step using greedy routing,

Distributed Dictionary

Backus

CoddYao

Thorsten Schütt 15

• With each routing step using greedy routing,

the distance between the currrent node

and the destination is halved.

=> Yields O(log2N) hops.

Dijkstra

Flyod

Gray

Karp

Knuth

Milner

Minsky

Naur

Nygaard

Perlis

Ritchie

Rivest

Wirth

Page 16: Scalable Wikipedia with Erlang - onScale

Summary: Chord#

• DHT is a fully decentralized data structure

• DHTs self-organize as nodes

• Only one hop for calculating a routing pointer, Chord needs log(N) hops.

DHTs/Chord Chord#

Thorsten Schütt 16

• DHTs self-organize as nodes join, leave, and fail

• All operations only require local knowledge

• max. log (N) hops, while Chord guarantees this only with “high probability”.

• Can adapt to imbalances in the query load; Chord can’t.

• Supports range queries.

Page 17: Scalable Wikipedia with Erlang - onScale

Scalable Wikipedia with Erlang

• Chord#

• Transactional Layer

• Load Balancing

Thorsten Schütt 17

• Wikipedia

ACID

Page 18: Scalable Wikipedia with Erlang - onScale

Transactions on DHTs are Challenging

• high churn rate

• nodes may leave, join, or crash at any time

� changing responsibilities

• “crash stop” fault model

Thorsten Schütt 18

• “crash stop” fault model

• no perfect “failure detector”

• never know whether a node crashed or just slow network

Page 19: Scalable Wikipedia with Erlang - onScale

Transactions + Replicas

START

debit (a, 100);

START

debit (a1, 100);

debit (a2, 100);

debit (a3, 100);

Thorsten Schütt 19

deposit (b, 100);

COMMIT

3

deposit (b1, 100);

deposit (b2, 100);

deposit (b3, 100);

COMMIT

Page 20: Scalable Wikipedia with Erlang - onScale

Adapted Paxos Commit

Leader ItemsTransaction

Manager

1. Step

2. Step

GetTPInitTM

RegisterRTM

RegisterTP

TMs and TPs

• Optimistic CC with fallback option

• Fast Read Operation

– just 1 round

– reads a majority (of the last versions)

of all replicates

Thorsten Schütt 20

3. Step

4. Step

5. Step

6. Step

TMs and TPs

Prepare

Prepared

Ack Prepared

Commit

of all replicates

• Write Operation

– 3 rounds

– nonblocking (fallback)

• succeeds when > f/2 nodes alive

Page 21: Scalable Wikipedia with Erlang - onScale

Summary “Transactions“

• Consistent way to update several items and their replicas

Thorsten Schütt 21

• Mitigates some of the Overlay Oddities

• Node Failures

• Asynchronous programming model

Page 22: Scalable Wikipedia with Erlang - onScale

Scalable Wikipedia with Erlang

• Chord#

• Transactional Layer

• Load Balancing

Thorsten Schütt 22

• Wikipedia

Page 23: Scalable Wikipedia with Erlang - onScale

Backus

Codd

Dijkstra

FlyodRivest

Wirth

Yao1. Pick 2 random nodes („Naur“,

„Wirth“)

2. IF Load(„Wirth“) >> Load(„Naur“)

1. „Naur“ leaves system

2. „Naur“ joins as „Tarjan“

Load-Balancing

Overloaded

Thorsten Schütt 23

Gray

Karp

Knuth

Milner

Minsky

Naur

Nygaard

Perlis

RitchieLoad can be any metric

D. Karger, M. Ruhl. „Simple Efficient Load Balancing

Algorithms for Peer-to-Peer Systems“. IPTPS 2004.

Page 24: Scalable Wikipedia with Erlang - onScale

Backus

Codd

Dijkstra

FlyodTarjan

Wirth

Yao

1. Pick 2 random nodes („Naur“, „Wirth“)

2. IF Load(„Wirth“) >> Load(„Naur“)

1. „Naur“ leaves system

2. „Naur“ joins as „Tarjan“

Load-Balancing

Thorsten Schütt 24

Gray

Karp

Knuth

Milner

Minsky

Naur

Nygaard

Perlis

Ritchie

RivestLoad can be any metric

D. Karger, M. Ruhl. „Simple Efficient Load Balancing

Algorithms for Peer-to-Peer Systems“. IPTPS 2004.

Page 25: Scalable Wikipedia with Erlang - onScale

Multi Data-Center Scenario

• Multi-Data-Center Scenarios– Optimize for Latency

– Increase Availability

• Prefix Articles with– Language

Thorsten Schütt 25

– Language

– Replicas Number

• E.g. „de:Main Page“– 5 replicas

– 2 in Germany

– 1 in UK

– 1 in USA

– 1 in Asia

„0de:Main Page“, „1de:Main Page“, „2de:Main Page“, …

Page 26: Scalable Wikipedia with Erlang - onScale

Scalable Wikipedia with Erlang

• Chord#

• Transactional Layer

• Load Balancing

Thorsten Schütt 26

• Wikipedia

Page 27: Scalable Wikipedia with Erlang - onScale

Wikipedia

Top 10 Web sites

1. Yahoo!

2. Google

3. YouTube

4. Windows Live

Wikipedia is the top1 open Web Site

– source code is open

– architecture is open

Thorsten Schütt 27

4. Windows Live

5. MSN

6. Myspace

7. Wikipedia

8. Facebook

9. Blogger.com

10. Yahoo!カテゴリ

– architecture is open

– dumps available

Source: alexa.com

Page 28: Scalable Wikipedia with Erlang - onScale

Wikipedia

Top 10 Web sites

1. Yahoo!

2. Google

3. YouTube

4. Windows Live

50.000 requests/sec

– 95% are answered by squid proxies

– 2,000 req./sec hit the backend

Thorsten Schütt 28

4. Windows Live

5. MSN

6. Myspace

7. Wikipedia

8. Facebook

9. Blogger.com

10. Yahoo!カテゴリ

Page 29: Scalable Wikipedia with Erlang - onScale

The Wikipedia System Architecture

other

Thorsten Schütt 29

web servers

search servers

NFS

Page 30: Scalable Wikipedia with Erlang - onScale

Erlang Implementation

Thorsten Schütt 30

Erlang Implementation

Page 31: Scalable Wikipedia with Erlang - onScale

Overlay

• Implemented in Erlang

– Chord#

– Load-Balancing

– Transaction Framework

• OTP behaviours

– Supervisor

Thorsten Schütt 31

– Supervisor

– gen_server

• Distributed Erlang

– Security Problems

– Scalability Problems

-> Own Transport-Layer on top of TCP

Page 32: Scalable Wikipedia with Erlang - onScale

Data Model

Wikipedia

– SQL DB

Chord#

– Key-Value Store

Thorsten Schütt 32

Map Relations to Key-Value Pairs

– (Title, List of Versions)

– (CategoryName, List of Titles)

– (Title, List of Titles) //Backlinks

CREATE TABLE /*$wgDBprefix*/page (

page_id int unsigned NOT

NULL auto_increment,

page_namespace int NOT NULL,

...

Page 33: Scalable Wikipedia with Erlang - onScale

Data Model

void updatePage(string title, int oldVersion, string newText)

{

//new transaction

Transaction t = new Transaction();

//read old version

Page p = t.read(title);

//check for concurrent update

if(p.currentVersion != oldVersion)

Thorsten Schütt 33

t.abort();

else{

//write new text

t.write(p.add(newText));

//update categories

foreach(Category c in p)

t.write(t.read(c.name).add(title));

//commit

t.commit();

}

}

Page 34: Scalable Wikipedia with Erlang - onScale

Wiki

Database:

• Chord#

• Mapping

– Wiki -> Key-Value Store

Thorsten Schütt 34

Renderer:

• Java

– Tomcat

– Plog4u

• Jinterface

– Interface to Erlang

Page 35: Scalable Wikipedia with Erlang - onScale

IEEE Scale Challenge 2008

Live Demos

• Bavarian

• Simple English

• Browsing

• Deployments

– Planet-Lab

• 20 nodes

– Cluster

• 320 nodes in Berlin

Thorsten Schütt 35

• Browsing

• Editing with full History

• Category-Pages

• 320 nodes in Berlin

Page 36: Scalable Wikipedia with Erlang - onScale

Summary

• Previously, P2P was mainly used for file sharing (read only).

• We support consistent, distributed write operations.

DHT + Transactions = Scalable, Reliable, Efficient Key/Value Store

Thorsten Schütt 36

• We support consistent, distributed write operations.

• Numerous applications

• Internet databases, transactional online-services, …

Page 37: Scalable Wikipedia with Erlang - onScale

Team

• Thorsten Schütt,

• Florian Schintke,

• Monika Moser,

• Stefan Plantikow,

• Alexander Reinefeld,

Thorsten Schütt 37

• Alexander Reinefeld,

• Nico Kruber,

• Christian von Prollius,

• Seif Haridi (SICS),

• Ali Ghodsi (SICS),

• Tallat Shafaat (SICS)

Page 38: Scalable Wikipedia with Erlang - onScale

Detailed Descriptions

WikiS. Plantikow, A. Reinefeld, F. Schintke.

Transactions for Distributed Wikis on Structured Overlays.

DSOM, October 2007.

TransactionsM. Moser, S. Haridi.

Atomic Commitment in Transactional DHTs.

1st CoreGRID Symposium, August 2007.

Thorsten Schütt 38

1st CoreGRID Symposium, August 2007.

T. Shafaat, M. Moser, A. Ghodsi , S. Haridi, T. Schütt, A. Reinefeld.

Key-Based Consistency and Availability in Structured Overlay Networks.

Infoscale, June 2008.

DHTT. Schütt, F. Schintke, A. Reinefeld.

A Structured Overlay for Multi-dimensional Range Queries.

Euro-Par, August 2007.

T. Schütt, F. Schintke, A. Reinefeld.

Structured Overlay without Consistent Hashing: Empirical Results.

GP2PC, May 2006.

Page 39: Scalable Wikipedia with Erlang - onScale

Questions

Thorsten Schütt 39

Questions