firehose engineering - scale · views 500gb fixed size. ... 1. maximize power generation 2. make...
TRANSCRIPT
![Page 1: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/1.jpg)
Firehose Engineeringdesigninghigh-volumedata collection systems
Josh BerkusHiLoad++, Moscow
October 2011
![Page 2: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/2.jpg)
Firehose Database Applications (FDA)
(1) very high volume of data input from many automated producers
(2) continuous processing of incoming data
![Page 3: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/3.jpg)
Mozilla Socorro
![Page 4: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/4.jpg)
Upwind
![Page 5: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/5.jpg)
Fraud Detection System
![Page 6: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/6.jpg)
Firehose Challenges
![Page 7: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/7.jpg)
1. Volume
● 100's to 1000's facts/second● GB/hour
![Page 8: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/8.jpg)
1. Volume
● spikes in volume● multiple uncoorindated sources
![Page 9: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/9.jpg)
1. Volume
volume always grows over time
![Page 10: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/10.jpg)
2. Constant flow
since data arrives 24/7 …
while the user interface can be down, data collection can never be down
![Page 11: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/11.jpg)
ETL
2. Constant flow● can't stop
receiving to process
● data can arrive out of order
![Page 12: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/12.jpg)
3. Database size
● terabytes to petabytes● lots of hardware● single-node DBMSes aren't enough● difficult backups, redundancy,
migration● analytics are resource-consumptive
![Page 13: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/13.jpg)
3. Database size
● database growth● size grows quickly● need to expand storage● estimate target data size● create data ageing policies
![Page 14: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/14.jpg)
3. Database size
“We will decide on a data retention policy when we run out
of disk space.”– every business user everywhere
![Page 15: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/15.jpg)
4.
![Page 16: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/16.jpg)
many components= many failures
![Page 17: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/17.jpg)
4. Component failure
● all components fail● or need scheduled downtime● including the network
● collection must continue● collection & processing must
recover
![Page 18: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/18.jpg)
solving firehose problems
![Page 19: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/19.jpg)
socorro project
![Page 20: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/20.jpg)
http://crash-stats.mozilla.com
![Page 21: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/21.jpg)
![Page 22: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/22.jpg)
Mozilla Socorro
collectors
processors
webservers
reports
![Page 23: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/23.jpg)
![Page 24: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/24.jpg)
socorro data volume
● 3000 crashes/minute● avg. size 150K
● 40TB accumulated raw data● 500GB accumulated metadata /
reports
![Page 25: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/25.jpg)
dealing with volume
load balancers collectors
![Page 26: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/26.jpg)
dealing with volume
monitor processors
![Page 27: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/27.jpg)
dealing with size
data40TB
expandible
metadataviews500GBfixed size
![Page 28: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/28.jpg)
dealing with component failure
● 30 Hbase nodes
● 2 PostgreSQL servers
● 6 load balancers
● 3 ES servers● 6 collectors● 12 processors● 8 middleware &
web servers
… lots of failures
Lots of hardware ...
![Page 29: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/29.jpg)
load balancing & redundancy
load balancers collectors
![Page 30: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/30.jpg)
elastic connections
● components queue their data● retain it if other nodes are down
● components resume work automatically
● when other nodes come back up
![Page 31: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/31.jpg)
elastic connections
collector
reciever local file queue
crash mover
![Page 32: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/32.jpg)
server management
● puppet● controls configuration of all servers● makes sure servers recover● allows rapid deployment of
replacement nodes
![Page 33: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/33.jpg)
Upwind
![Page 34: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/34.jpg)
Upwind
● speed● wind speed● heat● vibration● noise● direction
![Page 35: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/35.jpg)
Upwind
1. maximize power generation
2. make sure turbine isn't damaged
![Page 36: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/36.jpg)
dealing with volume
each turbine:
90 to 700 facts/second
windmills per farm: up to 100
number of farms: 40+
est. total: 300,000 facts/second
(will grow)
![Page 37: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/37.jpg)
dealing with volume
localstorage
historian analyticdatabase
reports
localstorage
historian analyticdatabase
reports
![Page 38: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/38.jpg)
dealing with volume
localstorage
historian analyticdatabase
localstorage
historian analyticdatabase
masterdatabase
![Page 39: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/39.jpg)
multi-tenant partitioning
● partition the whole application● each customer gets their own
toolchain
● allows scaling with the number of customers
● lowers efficiency● more efficient with virtualization
![Page 40: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/40.jpg)
dealing with:constant flow and size
historianminutebuffer
hourstable
daystable
monthstable
yearstable
historianminutebuffer
hourstable
daystable
monthstable
historianminutebuffer
hourstable
daystable
![Page 41: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/41.jpg)
time-based rollups
● continuously accumulate levels of rollup
● each is based on the level below it● data is always appended, never
updated● small windows == small resources
![Page 42: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/42.jpg)
time-based rollups
● allows:● very rapid summary reports for
different windows● retaining different summaries for
different levels of time● batch/out-of-order processing● summarization in parallel
![Page 43: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/43.jpg)
firehose tips
![Page 44: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/44.jpg)
data collection must be:
● continuous● parallel● fault-tolerant
![Page 45: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/45.jpg)
data processing must be:
● continuous● parallel● fault-tolerant
![Page 46: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/46.jpg)
every component must be able to fail
● including the network● without too much data loss● other components must
continue
![Page 47: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/47.jpg)
5 tools to use
1. queueing software
2. buffering techniques
3. materialized views
4. configuration management
5. comprehensive monitoring
![Page 48: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/48.jpg)
4 don'ts
1. use cutting-edge technology
2. use untested hardware
3. run components to capacity
4. do hot patching
![Page 49: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/49.jpg)
firehose mastered?
![Page 50: Firehose Engineering - SCALE · views 500GB fixed size. ... 1. maximize power generation 2. make sure turbine isn't damaged. dealing with volume each turbine: 90 to 700 facts/second](https://reader034.vdocuments.us/reader034/viewer/2022042811/5fa157b35d1eb37ee06aa080/html5/thumbnails/50.jpg)
Contact● Josh Berkus: [email protected]
● blog: blogs.ittoolbox.com/database/soup
● PostgreSQL: www.postgresql.org● pgexperts: www.pgexperts.com
● Upcoming Events● PostgreSQL Europe: http://2011.pgconf.eu/● PostgreSQL Italy: http://2011.pgday.it/
The text and diagrams in this talk is copyright 2011 Josh Berkus and is licensed under the creative commons attribution license. Title slide image is licensed from iStockPhoto and may not be reproduced or redistributed. Socorro images are copyright 2011 Mozilla Inc.