bdm37 - simon grondin - scaling an api proxy in ocaml

17
Scaling UP Why, how, and the lessons learned

Upload: big-data-montreal

Post on 15-Aug-2015

25 views

Category:

Software


0 download

TRANSCRIPT

Scaling UPWhy, how, and the lessons learned

Why• Isn't the holy grail of big data technologies to reach perfect horizontal scalability?• Google famously buy tons of cheap hardware

Author
- Explain what is horizontal scalability- What is scaling up

They work together• The techniques that make a system scale UP well will make it scale OUT better

• Approaching the domain with an UP mindset will eliminate deadend solutions from the start

• Death by complexity is real

Author
- Needs less layers- Talk about the API Analytics v1 server- C -> B -> A

Surprisingly, it's usually cheaper• By reducing the number of interconnections, we reduce the overhead.

• We avoid paying the diminishing returns tax.

Hardware might be cheaper than code,but managing it is hardly free.

Author
- Talk about Python and the throw more hardware philosophy- Talk about devops culture. Stability, less moving parts.

Most famous example: StackExchange (2014)

• 54th for traffic, 110 sites, 560M pageviews/month

• 25 servers, 3,000 req/sec

• Highly relational data, all in memory, all SQL

3 steps

1. Picking tools that encourage good design

2. Eliminating work

3. Eliminating synchronization

Good tools...

• Make the Good easy and rewarding

• Make the Exceptional hard, but possible

• Make the Bad impossible

Immutable & Idempotent data

• Each operation should be doable only by acting on a message

• Each node should be able to process messages without help

• Messages should represent one step and have 3 states: New, In Progress, Completed

Eliminating work

• In the end, the only way to make a finite amount of hardware do its work faster is to make it do less.

Author
Talk about HARchiver here, describe the use case and the constraints.

Avoid expensive operations• In VM languages, system calls are insiduous

• System calls are deadly (Unix.Unix_error "Invalid argument" select "") Fatal error: exception (Unix.Unix_error "Operation not permitted" send "")

• Especially opening sockets!

Bulk the rest...

• Pipelining

• Bulk operations

• ...All easier because there's no interactions between tasks

Often, the overhead is greater than the actual work

But don't block!• Long tasks should yield frequently to let short tasks complete immediately

What is wrong in this snippet?

The non-blocking version…

Eliminating synchronization

• Make pipelines instead of stars and webs

• Avoid shared state, even at the cost of higher load

Caching

• Less efficient as the number of machines increases

• And now the cache needs to scale along with the system

• Unless prohibitive, move the cache back into the node

Cold cache1. Check if available, if yes then return

2. No? Then check if being fetched, if yes register for it and return

3. No ? Then go get it, create a registration point

4. Wake everyone up with the result

5. If successful, save it

< Quick plug >

npm install hotcache