pycon ukraine 2016: maintaining a high load python project for newcomers

63
Maintaining a high load Python project for newcomers Viacheslav Kakovskyi PyCon Ukraine 2016

Upload: viacheslav-kakovskyi

Post on 06-Jan-2017

1.093 views

Category:

Software


3 download

TRANSCRIPT

Page 1: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Maintaining a high load Python projectfor newcomers

Viacheslav Kakovskyi PyCon Ukraine 2016

Page 2: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Me!@kakovskyi

Python Developer at SoftServeContributor of Atlassian HipChat — Python 2, TwistedMaintainer of KPIdata — Python 3, asyncio

2

Page 3: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Agenda● What project is `high load`?● High loaded projects from my experience● Case study: show last 5 feedbacks for a university course● Developer's checklist● Tools that help to make customers happy● Summary● Further reading

3

Page 4: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

What project is `high load`?

4

Page 5: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

What project is `high load`?● 2+ nodes?● 10 000 connections?● 200 000 RPS?● 1 000 000 daily active users?● monitoring?● scalability?● continuous deployment?● disaster recovery?● sharding?● clustering?● ???

5

Page 6: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

What project is `high load`?

a project where an inefficient solution or a tiny bug has a huge impact on your business (due to a lack of resources)→

→ causes an increase of costs $$$ or loss of reputation (due to performance degradation)

6

Page 7: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

High loaded Python projects from my experience

● Instant messenger: ○ 100 000+ connected users○ 100+ nodes○ 100+ developers

● Embedded system for traffic analysis: ○ scaling and upgrade options are unavailable

7

Page 8: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Some examples of issues from my experience● usage of a less-efficient library: json VS ujson● usage of a more complex serialization format: XML vs JSON● usage of a wrong data format for a certain case: JPEG vs BMP● usage of a wrong protocol: TCP vs UDP● usage of legacy code without understanding how it works under the hood: 100

PostgreSQL queries instead of 1● spawning a lot of objects that aren't destroyed by garbage collector● ...● deployment of a new feature which does not fit well with the load

on your production environment

8

Page 9: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Terms

9

● Elasticsearch - a search server that provides a full-text search engine

● Redis - an in-memory data structure server

● Capacity planning - a process aimed to determine an amount of resources that will be needed over some future period of time

● StatsD - a daemon for stats aggregation

● Feature flag - an ability to turn on/off some functionality of an application without deployment

Page 10: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Case studyLet's imagine some application for assessing the quality of higher education

● A university has faculties● A faculty has departments● A department has directions● A direction has groups● A group has students● A student learns courses

10

Page 11: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Case study● A student leaves feedback about courses● Feedbacks are stored in Elasticsearch for full-text search

A feedback looks like this:

Introduction to Software Engineering. Faculty of Applied Math

Good for ones who don't have any previous experience with programming and algorithms. Optional for prepared folks. They should request additional tasks to stay in a good shape.

11

Page 12: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Case study: show recent 5 feedbacks for the course

12

INTRODUCTION TO SOFTWARE ENGINEERING

100500

Recent feedbacks

Software engineering is about teams and it is about quality. The problems to solve are so complex or large, that a single developer cannot solve them anymore. See https://en.wikibooks.org/wiki/Introduction_to_Software_Engineering

Faculties

Page 13: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Case study: obvious solutionRequest the last 5 feedbacks directly from Elasticsearch

13

from elasticsearch import Elasticsearch

es = Elasticsearch()

def fetch_feedback(es, course_id, amount): query = _build_es_filter_query(doc_type='course', id=course_id, amount=amount) # blocking call to Elasticsearch entries = es.search(index='kpi', body=query) result = _validate_and_adapt(entries) return result

Page 14: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Case study

OK, just implement the solution, test on staging, and deploy to production.

14

Page 15: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

15WHERE YOU LIVE

YOUR OPS KNOWS

Page 16: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

16

EsRejectedExecutionException

[rejected execution (queue capacity 1000)

on org.elasticsearch.search.action.SearchServiceTransportAction]

Page 17: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Case study: optimizationHypotheses:

● configure Elasticsearch properly for the case ● cache responses from Elasticsearch for some time● use double writes:

○ write a feedback to Elasticsearch and Redis queue with a limited size○ fetch from Redis at first

17

Page 18: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Case study: prerequisites from our domain*● up to 1000 characters allowed for a feedback● 50 000 feedbacks expected just for Kyiv Polytechnic Institute every year● 300 000+ applicants in 2016● 100+ universities in Ukraine if we decide to scale

18*it's just an assumption for the example case study

Page 19: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Case study: let's measure the current load on production

Operations:

● add a feedback● retrieve last 5 feedbacks● find a feedback by a phrase

19

Page 20: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Case study: let's measure the current load on productionApplication metrics:

● add a feedback○ stats.count.feedback.course.added.es○ stats.timing.feedback.course.added.es

● retrieve latest 5 feedbacks○ stats.count.feedback.course.fetched.es○ stats.timing.feedback.course.fetched.es

● find a feedback by a phrase○ stats.count.feedback.course.found.es○ stats.timing.feedback.course.found.es

20

Page 21: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Case study: how to add a metric to your code

21

from elasticsearch import Elasticsearchfrom statsd import StatsClient

es = Elasticsearch()statsd = StatsClient()

def fetch_feedback(statsd, es, course_id, amount):statsd.incr('feedback.course.fetched.es')

# don't perform anything query, just collect stats return query = _build_es_filter_query(doc_type='course', id=course_id, amount=amount) with statsd.timer('feedback.course.fetched.es'): # blocking call to Elasticsearch entries = es.search(index='kpi', body=query) result = _validate_and_adapt(entries) return result

Page 22: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Case study: how to add a metric to your code

22

def write_feedback_to_elasticsearch(statsd, es, course_id, doc): statsd.incr('feedback.course.added.es') with statsd.timer('feedback.course.added.es'): # blocking call to Elasticsearch result = es.index(index='kpi', doc_type='course', id=course_id, body=doc)

def find_feedback(statsd, es, phrase, course_id=None) statsd.incr('feedback.course.found.es') query = _build_es_search_query(doc_type='course', id=course_id, phrase=phrase) with statsd.timer('feedback.course.found.es'): # blocking call to Elasticsearch entries = es.search(index='kpi', body=query) result = _validate_and_adapt(entries) return result

Page 23: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Visualize metrics: RPS, feature-related operations

23Add feedback Find feedback Fetch feedback

Page 24: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Visualize metrics: Course feedback request performance

24Add feedback Find feedback Fetch feedback

Page 25: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Case study: visualize collected metricsOutcomes:● we know frequency of operations● we know timing of operations● we know what to optimize● we can perform a capacity planning for a new flow

25

Page 26: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Optimization: double writes● continue using Elasticsearch as a storage for feedbacks● duplicate writing of a feedback to Elasticsearch and Redis● store last 5 feedbacks in Redis for faster retrieval● use Elasticsearch for custom queries and full-text search

26

Page 27: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Optimization

27

from elasticsearch import Elasticsearch

es = Elasticsearch()

def fetch_feedback(es, redis, course_id, amount): result = None

if amount <= REDIS_FEEDBACK_QUEUE_SIZE: # REDIS_FEEDBACK_QUEUE_SIZE = 5result = _fetch_feedback_from_redis(redis, course_id, amount)

if not result: result = _fetch_feedback_from_elasticsearch(es, course_id, amount)

return result

Page 28: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Optimization

28

def _fetch_feedback_from_elasticsearch (es, course_id, amount): query = _build_es_filter_query(doc_type ='course', id=course_id, amount =amount) # blocking call to Elasticsearch entries = es.search(index='kpi', body=query) result = _validate_and_adapt(entries) return result

def _fetch_feedback_from_redis (redis, course_id, amount):queue = redis.get_queue(entity='course', id=course_id)

# blocking call to Redis result = queue.get(amount) return result

Page 29: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Optimization

29

def add_feedback(es, redis, course_id, doc):_write_feedback_to_redis(redis, course_id, doc)

_write_feedback_to_elasticsearch(es, course_id, doc)

def _write_feedback_to_elasticsearch (es, course_id, doc): # blocking call to Elasticsearch result = es.index(index='kpi', doc_type='course', id=course_id, body =doc)

def _write_feedback_to_redis (statsd, redis, course_id, doc):queue = redis.get_queue(entity='course', id=course_id)

# blocking call to Redis queue.push(doc)

Page 30: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Optimization: potential impact on production● Increased:

○ Insert feedback time○ Redis capacity○ Network traffic for Redis

● Reduced:○ Fetch feedback time○ Elasticsearch capacity○ Network traffic for Elasticsearch

30

Page 31: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

31

MEASURE ALL THE THINGS

Page 32: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Measure: timing of insert and fetch operations

32

def _fetch_feedback_from_elasticsearch(statsd, es, course_id, amount): statsd.incr('feedback.course.fetched.es') query = _build_es_filter_query(doc_type='course', id=course_id, amount=amount) with statsd.timer('feedback.course.fetched.es'): # blocking call to Elasticsearch entries = es.search(index='kpi', body=query) result = _validate_and_adapt(entries) return result

def _fetch_feedback_from_redis(statsd, redis, course_id, amount): statsd.incr('feedback.course.fetched.redis') queue = redis.get_queue(entity='course', id=course_id) with statsd.timer('feedback.course.fetched.redis'): # blocking call to Redis result = queue.get(amount) return result

Page 33: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Measure: timing of insert and fetch operations

33

def _write_feedback_to_elasticsearch (statsd, es, course_id, doc):statsd.incr('feedback.course.added.es' )

with statsd.timer('feedback.course.added.es' ): # blocking call to Elasticsearch result = es.index(index='kpi', doc_type='course', id =course_id, body=doc)

def _write_feedback_to_redis (statsd, redis, course_id, doc): statsd.incr('feedback.course.added.redis' ) queue = redis.get_queue(entity='course', id=course_id) with statsd.timer('feedback.course.added.redis' ): # blocking call to Redis queue.push(doc)

Page 34: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Measure: Redis capacity● A feedback - up to 1000 characters● Redis is used for storing 5 feedbacks per course ● 10 000 courses for Kyiv Polytechnic Institute● Key: feedback:course:<course_id>● Data structure: List● Commands:

○ LPUSH - O(1)○ LRANGE - O(S+N), S=0, N=5○ LTRIM - O(N)

34

Page 35: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Measure: Redis capacity● Don't trust benchmarks from the internet● Run a benchmark for a production-like environment with your sample data● Example:

○ FLUSHALL○ define a sample feedback (string up to 1000 characters)○ create N=10 000 lists with M=5 sample feedbacks○ measure allocated memory

● You can run an approximated benchmark and calculate expected memory size

35

Page 36: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Measure: Redis capacity● 76.3 MB for 10000 courses, Kyiv Polytechnic Institute● 7GB for 100 Ukrainian universities

36

Page 37: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Measure: Network traffic for Redis● Measure network traffic for send/receive operations:

○ add_feedback → LPUSH○ fetch_feedback → LRANGE

● Revise Redis protocol (RESP)● Calculate expected sent/received data for new Redis

operations:○ How much data sent for LPUSH○ How much data received for LRANGE

37

Page 38: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Measure: Network traffic for Redis

from aioredis.util import encode_commandadd_feedback = len(encode_command(b'LPUSH

feedback:course:100500 "MY_AWESOME_FEEDBACK"'))

https://github.com/aio-libs/aioredis/blob/master/aioredis/util.py38

Page 39: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Measure: Network traffic for Redis *

● MAX add_feedback_traffic = 1.5 Mbps● AVG add_feedback_traffic = 0.8 Mbps● MAX fetch_feedback_traffic = 30 Mbps● AVG fetch_feedback_traffic = 10 Mbps

* This step is optional and depends on your architecture (optional)

39

Page 40: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Summary of the investigation around double writes

● 90% of fetch feedback requests could be processed by Redis

● Initial issue when Elasticsearch is out of queue capacity should be avoided

40

Page 41: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Summary of the investigation● Fetch feedback time is reduced

■ 2 ms per fetch for 90% of cases● Increased:

○ Insert feedback time■ 16 ms per insert

○ Redis capacity■ 76.3 MB for 10000 courses, Kyiv Polytechnic Institute■ 7GB for 100 Ukrainian universities

○ Network traffic for Redis■ 11 Mbps

41

Page 42: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Making a decision

42

● Implement a prototype● Discuss collected stats with Ops● And with Business guys● Implement the solution● Deploy under a feature flag

Page 43: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Adding a feature flag

43

from feature import Feature

feature = Feature()

def fetch_feedback(feature, statsd, es, redis, course_id, amount): result = None if feature.is_enabled('fetch_feedback_from_redis') and amount <= REDIS_FEEDBACK_QUEUE_SIZE: # 5 feedbacks in queue fetched_from_redis = True result = _fetch_feedback_from_redis(statsd, redis, course_id, amount)

if feature.is_enabled('fetch_feedback_from_elasticsearch') and not result: result = _fetch_feedback_from_elasticsearch(statsd, es, course_id, amount)

return result

Page 44: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Rolling the feature only for a subset of users

44

Page 45: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Rolling the feature only for a subset of users

45

Page 46: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

RPS. Feature "Fetch last 5 feedbacks about a course". Rolled out for 1% of users.

46Fetch from Elasticsearch Fetch from Redis

Page 47: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Incremental rollout prevented the incident

EsRejectedExecutionException[rejected execution (queue capacity 1000)

on org.elasticsearch.search.action.SearchServiceTransportAction]

47

Page 48: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Investigation

48

● Disable the feature● Run investigation

○ Only recent feedbacks are retrieved from Redis○ Legacy feedbacks are fetched directly from Elasticsearch

● Solution○ Write legacy feedbacks to Redis using a background job

Page 49: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Fixing missed data in Redis

49

def fetch_feedback(feature, statsd, es, redis, course_id, amount):fetched_from_redis, result = False, None

if feature.is_enabled('fetch_feedback_from_redis') and amount <= REDIS_FEEDBACK_QUEUE_SIZE: fetched_from_redis = True result = _fetch_feedback_from_redis(statsd, redis, course_id, amount)

if feature.is_enabled('fetch_feedback_from_elasticsearch') and not result: result = _fetch_feedback_from_elasticsearch(statsd, es, course_id, amount)

if fetched_from_redis: # redis was empty for the coursefill_redis(redis, result, amount=REDIS_FEEDBACK_QUEUE_SIZE)

return result

Page 50: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

RPS. Feature "Fetch last 5 feedbacks about a course". Fixed and rolled out for 1% of users.

50Fetch from Elasticsearch Fetch from Redis

Page 51: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

RPS. Feature "Fetch last 5 feedbacks about a course". Fixed and rolled out for 100% of users.

51Fetch from Elasticsearch Fetch from Redis

Page 52: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Feature has been deployed for 100% users

52

Page 53: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Developer's checklist for adding a feature to a high loaded project

● discover which services are hit by the feature○ database○ cache○ storage○ whatever

● measure the impact of the feature on the existing environment○ call frequency○ amount of memory○ traffic○ latency

53

Page 54: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Developer's checklist for adding a feature to a high loaded project (2)

● calculate allowed load for the feature○ requests per second for the existing environment○ a timing of request processing

● calculate the additional load for the feature○ latency for additional requests○ how to deal with a lack of resources

54

Page 55: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Developer's checklist for adding a feature to a high loaded project (3)

● discuss the acceptability of the solution○ with peers○ with Ops○ with business owners

● consider alternatives if needed● perform load testing on staging● rollout the feature to production incrementally

55

Page 59: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Summary

59

Page 60: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Summary● Be careful with calls to external services● Collect metrics about state of your production environment● Perform a capacity planning for "serious" changes● Use application metrics and measure potential load● Roll out new code incrementally with feature flags● Set proper monitoring, it can prevent majority of incidents● Use the tools, it's really easy ● Be ready to rollback fast

60

Page 61: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

To be continued

● asynchronous programming● infrastructure as a service● testing● monitoring and alerting● dealing with bursty traffic● OS and hardware metrics● scaling● distributed applications● continuous integration

61

Page 62: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Further reading● How HipChat Stores and Indexes Billions of Messages Using● Continuous Deployment at Instagram● How Twitter Uses Redis To Scale ● Why Leading Companies Dark Launch - LaunchDarkly Blog● Lessons Learned From A Year Of Elasticsearch ... - Tech blog● Notes on Redis Memory Usage● Using New Relic to Understand Redis Performance: The 7 Key Metrics● A guide to analyzing Python performance

62

Page 63: PyCon Ukraine 2016: Maintaining a high load Python project for newcomers

Questions?

63

Viacheslav [email protected]

@kakovskyi