piterpy 2016: parallelization, aggregation and validation of api in python

Parallelisation, aggregation and validation API with Python

Max KlymyshynCTO at CartFresh

@maxmaxmaxmax

‣ 12+ years of experience, 7 years with Python, 6 with JS

‣ Was part of oDesk, Helios, 42cc.

‣ Co-organizer of PyCon Ukraine, KyivJS, Papers We Love

‣ CTO at CartFresh

‣ Challenging myself with english talk. It’s not my first language, bear with me

‣ Grocery Delivery startup

‣ Operating as CartFresh (Boston, US) and ZAKAZ.UA

(Kiev, Dnepropetrovsk, Kharkiv, Ukraine)

‣ Apache CouchDB, Apache Solr, Redis

‣ Heavy python on back-end

CartFresh

‣ Quick overview

‣ Some abstract info about context

‣ Tools for Python

Table of contents

Why API again?

World is changing very quickly:

‣ Mobile apps

‣ Internet of Things

‣ Microservices

‣ Isomorphic apps

Why API again?

Good API is hardwhen all your stuff should work together well

‣ Validation

‣ Reusability

‣ Consistency

‣ Maintainability

‣ Scalability

It’s challenging

Good API makes it easier to develop a service

Divide and conquer (D&C)

‣ API expresses a software component in terms of its operations, inputs, outputs, and underlying types

‣ API helps create reusable building blocks and communicate between system components

‣ It opens new opportunities to develop new systems based on your product

Overview

Moving parts

VALIDATION

OUTPUT

‣ input – need to be validated for type correctness

‣ validation – input should be constrained by domain-

specific business rules

‣ business logic – obviously most useful part of the system

‣ output – data model, serialised into specific format

Moving parts

API creation becomes trivial with good understanding and right tools

All challenges are behind: how to make it simple, how to make it maintainable, how

to keep API users updated

Trends during the past few years

‣ RESTification

‣ Data Query Languages

‣ Microservices architecture

Trends

‣ Unified interface to communication protocol between client and API

‣ Built on top of HTTP

‣ Simple

Data Query Languages

‣ GraphQL

‣ Falcor

‣ Datalog

‣ Datomic

Data Query Languages

Main point of DQL is to make declarative composition of queries to simple data structures and

represent it as single data structure

Common case

Monolithic service

Monolite

More realistic case

Microservices

Monolit Microservices

Microservices

Monolit Microservices

Difference

‣ New layer of complexity in terms of input validation

‣ New unreliable layer (network)

‣ Additional protocol overhead

‣ Communication latency

Seriously

‣ You’ll get a chance to improve each piece of code

separately without breaking other part of the system (D&C!)

‣ You can split development of microservices between

different dev teams

‣ You’ll get a lot of fun!

But let’s be optimistic

‣ SWAGGER – a simple representation of your RESTful API

(OpenAPI initiative), FLEX for Python

‣ RESTful API Modelling Language – RAML

‣ APIDOC – a documentation from API annotations in your

source code

‣ api-blueprint, RESTUnite, apiary etc.

API Frameworks

paths: /products: get: summary: Product Types description: | The Products endpoint returns information about the *Uber* products offered at a given location. The response includes the display name and other details about each product, and lists the products in the proper display order. parameters: - name: latitude in: query description: Latitude component of location. required: true type: number format: double - name: longitude in: query description: Longitude component of location. required: true type: number format: double tags: - Products responses: 200: description: An array of products schema: type: array items: $ref: '#/definitions/Product'

Swagger spec example

/products: uriParameters: displayName: Products description: A collection of products post: description: Create a product #Post body media type support #text/xml: !!null # media type text, xml support #application/json: !!null #media type json support body: application/json: schema: | { "$schema": "http://json-schema.org/draft-03/schema", "product": { "name": { "required": true, "type": "string" }, "description": { "required": true, "type": "string" }

RAML spec example

example: | { "product": { "id": "1", "name": "Product One", ... } } get: description: Get a list of products queryParameters: q: description: Search phrase to look for products type: string required: false responses: 200: body: application/json: #example: !include schema/product-list.json

RAML spec example

To prevent situation when documentation, client libraries, and source code get out of sync

CLIENT #1 SERVER CLIENT #2

‣ Predefined input parameters + validation

‣ Predefined response schema (model)

‣ Query Language

Aggregation

GraphQL/Grapheneimport graphene import pprint

data = [1, 2, 3, 4]

class Query(graphene.ObjectType): hello = graphene.String() data = graphene.String()

def resolve_data(self, args, info): return ",".join(map(str, data))

def resolve_hello(self, args, info): return 'World'

schema = graphene.Schema(query=Query) result = schema.execute('{ hello, data }') pprint.pprint(result.data)

# OrderedDict([('hello', u'World'), ('data', u'1,2,3,4')])

GraphQL’s power comes from a simple idea — instead of defining the structure of responses

on the server, the flexibility is given to the client.

GraphQL vs REST

GraphQL/graphene allow usto use our beloved language

for declaration of Model/API Schema: python

GraphQL vs Swagger

Batching

Tools: django-batch-requests

[ { "method": "get", "url": "/sleep/?seconds=3" }, { "method": "get", "url": "/sleep/?seconds=3" } ]

[ { "headers": { "Content-Type": "text/html; charset=utf-8", "batch_requests.duration": 3 }, "status_code": 200, "body": "Success!", "reason_phrase": "OK" }, { "headers": { "Content-Type": "text/html; charset=utf-8", "batch_requests.duration": 3 }, "status_code": 200, "body": "Success!", "reason_phrase": "OK" } ]

Our experience

‣ End up with batched API interface

‣ Declarative input validation with trafaret

‣ Free schema (disadvantage)

‣ Very simple SQL-JOIN-like aggregation

Params, validation, transformation

@validate_args( _('Invalid request'), store_id=tr.String() >> pipe(unicode, unicode.strip), slugs=tr.List(tr.String() >> pipe(unicode, unicode.strip)), ean=tr.String | tr.Null, extended=tr.Bool | tr.Null, query=tr.String | tr.Null, facets=tr.List( tr.List(tr.String, min_length=2, max_length=2)) | tr.Null, sort=tr.String(allow_blank=True) | tr.Null, _optional=('extended', 'query', 'facets', 'sort', 'ean')) def resource_products(store, user, session, limit=None, offset=1, lang='en', args=None, **kwargs): pass

[ "store.products", { store_id: Storage.first(“store").id, slugs: [options.slug], facets: options.facets || [], sort: options.sort || “" }, { offset: options.offset || 1, id: "catalog", join: [{ apply_as: "facets_base", on: ["slug", "slug"], request: { type: "store.facets", args: { store_id: "$request.[-2].args.store_id", slug: "$request.[-2].args.slugs|first" } } }, { apply_as: "category_tree", on: ["slug", "requested_slug"], request: { type: "store.department_tree", args: { store_id: "$request.[-2].args.store_id", slug: "$request.[-2].args.slugs|first" } } }] } ]

Thanks.

@maxmaxmaxmax

piterpy 2016: parallelization, aggregation and validation of api in python

Software

parallelization of explicit and implicit solver · —...

parallelization and performance optimization of...

smith waterman algorithm parallelization

news aggregation social bot using data mining - ijcta ·...

loop parallelization & pipelining

trend towards parallelization

master/slave speculative parallelization

assisting technologies for program parallelization

parallelization using open mp

parallelization of gauss-seidel relaxation for real gas...

automatic parallelization by pattern-matching · pdf...

Łukasz kokoszkiewicz. envirogrids project overview swat...

parallelization and tuning

parallelization - xs4all klantenservice

parallel architectures & parallelization principles

parallelization of dijkstra's algorithm

parallel monte-carlo tree search - maastricht university ·...

test parallelization using jenkins

automatic parallelization and parallel recursive

tightfit : adaptive parallelization with foresight