piterpy 2016: parallelization, aggregation and validation of api in python
TRANSCRIPT
Parallelisation, aggregation and validation API with Python
Max KlymyshynCTO at CartFresh
@maxmaxmaxmax
‣ 12+ years of experience, 7 years with Python, 6 with JS
‣ Was part of oDesk, Helios, 42cc.
‣ Co-organizer of PyCon Ukraine, KyivJS, Papers We Love
‣ CTO at CartFresh
‣ Challenging myself with english talk. It’s not my first language, bear with me
About
‣ Grocery Delivery startup
‣ Operating as CartFresh (Boston, US) and ZAKAZ.UA
(Kiev, Dnepropetrovsk, Kharkiv, Ukraine)
‣ Apache CouchDB, Apache Solr, Redis
‣ Heavy python on back-end
CartFresh
‣ Quick overview
‣ Some abstract info about context
‣ Tools for Python
Table of contents
Why API again?
World is changing very quickly:
‣ Mobile apps
‣ Internet of Things
‣ Microservices
‣ Isomorphic apps
Why API again?
Good API is hardwhen all your stuff should work together well
‣ Validation
‣ Reusability
‣ Consistency
‣ Maintainability
‣ Scalability
It’s challenging
Good API makes it easier to develop a service
Divide and conquer (D&C)
‣ API expresses a software component in terms of its operations, inputs, outputs, and underlying types
‣ API helps create reusable building blocks and communicate between system components
‣ It opens new opportunities to develop new systems based on your product
Overview
Moving parts
VALIDATION
OUTPUT
BL
INPUT
‣ input – need to be validated for type correctness
‣ validation – input should be constrained by domain-
specific business rules
‣ business logic – obviously most useful part of the system
‣ output – data model, serialised into specific format
Moving parts
API creation becomes trivial with good understanding and right tools
All challenges are behind: how to make it simple, how to make it maintainable, how
to keep API users updated
Trends during the past few years
‣ RESTification
‣ Data Query Languages
‣ Microservices architecture
Trends
REST
‣ Unified interface to communication protocol between client and API
‣ Built on top of HTTP
‣ Simple
Data Query Languages
‣ GraphQL
‣ Falcor
‣ Datalog
‣ Datomic
etc.
Data Query Languages
Main point of DQL is to make declarative composition of queries to simple data structures and
represent it as single data structure
Common case
Monolithic service
Monolite
More realistic case
Microservices
Monolit Microservices
Microservices
Monolit Microservices
Difference
‣ New layer of complexity in terms of input validation
‣ New unreliable layer (network)
‣ Additional protocol overhead
‣ Communication latency
Seriously
‣ You’ll get a chance to improve each piece of code
separately without breaking other part of the system (D&C!)
‣ You can split development of microservices between
different dev teams
‣ You’ll get a lot of fun!
But let’s be optimistic
Tools
‣ SWAGGER – a simple representation of your RESTful API
(OpenAPI initiative), FLEX for Python
‣ RESTful API Modelling Language – RAML
‣ APIDOC – a documentation from API annotations in your
source code
‣ api-blueprint, RESTUnite, apiary etc.
API Frameworks
paths: /products: get: summary: Product Types description: | The Products endpoint returns information about the *Uber* products offered at a given location. The response includes the display name and other details about each product, and lists the products in the proper display order. parameters: - name: latitude in: query description: Latitude component of location. required: true type: number format: double - name: longitude in: query description: Longitude component of location. required: true type: number format: double tags: - Products responses: 200: description: An array of products schema: type: array items: $ref: '#/definitions/Product'
Swagger spec example
/products: uriParameters: displayName: Products description: A collection of products post: description: Create a product #Post body media type support #text/xml: !!null # media type text, xml support #application/json: !!null #media type json support body: application/json: schema: | { "$schema": "http://json-schema.org/draft-03/schema", "product": { "name": { "required": true, "type": "string" }, "description": { "required": true, "type": "string" }
RAML spec example
example: | { "product": { "id": "1", "name": "Product One", ... } } get: description: Get a list of products queryParameters: q: description: Search phrase to look for products type: string required: false responses: 200: body: application/json: #example: !include schema/product-list.json
RAML spec example
To prevent situation when documentation, client libraries, and source code get out of sync
CLIENT #1 SERVER CLIENT #2
‣ Predefined input parameters + validation
‣ Predefined response schema (model)
‣ Query Language
Aggregation
GraphQL/Grapheneimport graphene import pprint
data = [1, 2, 3, 4]
class Query(graphene.ObjectType): hello = graphene.String() data = graphene.String()
def resolve_data(self, args, info): return ",".join(map(str, data))
def resolve_hello(self, args, info): return 'World'
schema = graphene.Schema(query=Query) result = schema.execute('{ hello, data }') pprint.pprint(result.data)
# OrderedDict([('hello', u'World'), ('data', u'1,2,3,4')])
GraphQL’s power comes from a simple idea — instead of defining the structure of responses
on the server, the flexibility is given to the client.
GraphQL vs REST
GraphQL/graphene allow usto use our beloved language
for declaration of Model/API Schema: python
GraphQL vs Swagger
Batching
Tools: django-batch-requests
[ { "method": "get", "url": "/sleep/?seconds=3" }, { "method": "get", "url": "/sleep/?seconds=3" } ]
[ { "headers": { "Content-Type": "text/html; charset=utf-8", "batch_requests.duration": 3 }, "status_code": 200, "body": "Success!", "reason_phrase": "OK" }, { "headers": { "Content-Type": "text/html; charset=utf-8", "batch_requests.duration": 3 }, "status_code": 200, "body": "Success!", "reason_phrase": "OK" } ]
Our experience
‣ End up with batched API interface
‣ Declarative input validation with trafaret
‣ Free schema (disadvantage)
‣ Very simple SQL-JOIN-like aggregation
Params, validation, transformation
@validate_args( _('Invalid request'), store_id=tr.String() >> pipe(unicode, unicode.strip), slugs=tr.List(tr.String() >> pipe(unicode, unicode.strip)), ean=tr.String | tr.Null, extended=tr.Bool | tr.Null, query=tr.String | tr.Null, facets=tr.List( tr.List(tr.String, min_length=2, max_length=2)) | tr.Null, sort=tr.String(allow_blank=True) | tr.Null, _optional=('extended', 'query', 'facets', 'sort', 'ean')) def resource_products(store, user, session, limit=None, offset=1, lang='en', args=None, **kwargs): pass
[ "store.products", { store_id: Storage.first(“store").id, slugs: [options.slug], facets: options.facets || [], sort: options.sort || “" }, { offset: options.offset || 1, id: "catalog", join: [{ apply_as: "facets_base", on: ["slug", "slug"], request: { type: "store.facets", args: { store_id: "$request.[-2].args.store_id", slug: "$request.[-2].args.slugs|first" } } }, { apply_as: "category_tree", on: ["slug", "requested_slug"], request: { type: "store.department_tree", args: { store_id: "$request.[-2].args.store_id", slug: "$request.[-2].args.slugs|first" } } }] } ]
Thanks.
@maxmaxmaxmax