maxim zaks: deep dive into data serialisation

Post on 22-Jan-2018

39 Views

Category:

Software

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MAXIM ZAKSFreelance Software Developer

Deep dive into data serialisation

Persisting State / User generated Content

Machine to Machine Communication

Representation of Configuration / Read only data

Why do we need data serialisation?

Custom binary representation

Language provided binary serialisation

Text based representation (CSV, XML, JSON, YAML)

Embedded SQL or NoSQL DB

Binary cross platform serialisation library (FlatBuffers, FlexBuffers Protocol Buffers, Cap’n Proto, SBE, Apache Thrift etc…)

How can we persist data on mobile?

Size “on disk”

Speed of read & write / partial read / memory consumption

Human readable and writable

Support of OO language type system

Data versioning / evolution / migration

Important criteria for persisting data

JSON vs.

Protocol Buffers vs.

FlatBuffers

Text based

Self describing

Weakly typed: Object, Array, String, Number, Bool, Null

JSON

Binary

IDL (interface definition language) based Schema

Strongly typed: Message, fix length Numbers, variable length

numbers, String, Bool, Enum, Bytes, Repeated values, Maps,

Oneof, Any

Evolution Strategy

Protocol Buffers

Binary

IDL (interface definition language) based Schema

Strongly typed: Table, Struct, fix length Numbers, String, Bool, Enum, Bytes, typed Vector, Union

Value / Reference type semantics

Random Value access

Evolution Strategy

Flat Buffers

Size “on disk”

Speed of read & write / partial read / memory consumption

Human readable and writable

Support of OO language type system

Data versioning / evolution / migration

Important criteria for persisting data

👎 JSON, bad for representing numbers, lots of repetition,

needs to be at least minified

👍 Protocol Buffers, stores only values and a bit of meta data,

thanks to VLQ very efficient for storing numbers

🤔 Flat Buffers, has some overhead compared to Protocol Buffers because of Ref semantics and random value access

Important criteria for persisting data: Size

👎 JSON, text need to be parsed and translated to

intermediate data representation

🤔 Protocol Buffers, no partial read, VLQ means you need to do

some operations before value is available

👍 Flat Buffers, support partial reading thanks to Ref semantics and random value access mechanism

Important criteria for persisting data: Speed of read & write / partial read / memory consumption

👍 JSON, text based and can be nicely formatted

🤔 Protocol Buffers, provides tools for binary to JSON conversion and vice versa

🤔 Flat Buffers, provides tools for binary to JSON conversion

and vice versa

Important criteria for persisting data: Human readable and writable

🤔 JSON, is generally weakly typed, there are ways to

transform/validate against OO types

👍 Protocol Buffers, code generator creates Accessor classes

which can be comfortably used for encoding and decoding

👍 Flat Buffers, code generator creates Accessor classes which can be comfortably used for decoding, performant encoding can

be a bit painful

Important criteria for persisting data: Support of OO language type system

🤔 JSON, is implicit because of self describing nature

👍 Protocol Buffers, provides a set of rule how a schema can be evolved

👍 Flat Buffers, provides a set of rule how a schema can be

evolved

Important criteria for persisting data: Data versioning / evolution / migration

Size Efficiency Human Readable Types Evolution

JSON 👎 👎 👍 🤔 🤔

Proto Buffers 👍 🤔 🤔 👍 👍

Flat Buffers 🤔 👍 🤔 👍 👍

Protocol Buffers Encoding: https://developers.google.com/protocol-buffers/docs/encoding

FlatBuffers: https://google.github.io/flatbuffers/

FlatBuffersSwift: https://github.com/mzaks/FlatBuffersSwift

FlexBuffersSwift: https://github.com/mzaks/FlexBuffersSwift

Data Serialisation formats: https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats

JVM Serialisers: https://github.com/eishay/jvm-serializers

More Links

WWW.MDEVTALK.CZ

mdevtalk

top related