maxim zaks: deep dive into data serialisation

MAXIM ZAKSFreelance Software Developer

Deep dive into data serialisation

Persisting State / User generated Content

Machine to Machine Communication

Representation of Configuration / Read only data

Why do we need data serialisation?

Custom binary representation

Language provided binary serialisation

Text based representation (CSV, XML, JSON, YAML)

Embedded SQL or NoSQL DB

Binary cross platform serialisation library (FlatBuffers, FlexBuffers Protocol Buffers, Cap’n Proto, SBE, Apache Thrift etc…)

How can we persist data on mobile?

Size “on disk”

Speed of read & write / partial read / memory consumption

Human readable and writable

Support of OO language type system

Data versioning / evolution / migration

Important criteria for persisting data

JSON vs.

Protocol Buffers vs.

FlatBuffers

Text based

Self describing

Weakly typed: Object, Array, String, Number, Bool, Null

Binary

IDL (interface definition language) based Schema

Strongly typed: Message, fix length Numbers, variable length

numbers, String, Bool, Enum, Bytes, Repeated values, Maps,

Oneof, Any

Evolution Strategy

Protocol Buffers

Binary

IDL (interface definition language) based Schema

Strongly typed: Table, Struct, fix length Numbers, String, Bool, Enum, Bytes, typed Vector, Union

Value / Reference type semantics

Random Value access

Evolution Strategy

Flat Buffers

Size “on disk”

Speed of read & write / partial read / memory consumption

Human readable and writable

Support of OO language type system

Data versioning / evolution / migration

Important criteria for persisting data

👎 JSON, bad for representing numbers, lots of repetition,

needs to be at least minified

👍 Protocol Buffers, stores only values and a bit of meta data,

thanks to VLQ very efficient for storing numbers

🤔 Flat Buffers, has some overhead compared to Protocol Buffers because of Ref semantics and random value access

Important criteria for persisting data: Size

👎 JSON, text need to be parsed and translated to

intermediate data representation

🤔 Protocol Buffers, no partial read, VLQ means you need to do

some operations before value is available

👍 Flat Buffers, support partial reading thanks to Ref semantics and random value access mechanism

Important criteria for persisting data: Speed of read & write / partial read / memory consumption

👍 JSON, text based and can be nicely formatted

🤔 Protocol Buffers, provides tools for binary to JSON conversion and vice versa

🤔 Flat Buffers, provides tools for binary to JSON conversion

and vice versa

Important criteria for persisting data: Human readable and writable

🤔 JSON, is generally weakly typed, there are ways to

transform/validate against OO types

👍 Protocol Buffers, code generator creates Accessor classes

which can be comfortably used for encoding and decoding

👍 Flat Buffers, code generator creates Accessor classes which can be comfortably used for decoding, performant encoding can

be a bit painful

Important criteria for persisting data: Support of OO language type system

🤔 JSON, is implicit because of self describing nature

👍 Protocol Buffers, provides a set of rule how a schema can be evolved

👍 Flat Buffers, provides a set of rule how a schema can be

evolved

Important criteria for persisting data: Data versioning / evolution / migration

Size Efficiency Human Readable Types Evolution

JSON 👎 👎 👍 🤔 🤔

Proto Buffers 👍 🤔 🤔 👍 👍

Flat Buffers 🤔 👍 🤔 👍 👍

Mobile Warsaw 2016: https://www.youtube.com/watch?

v=OUNpNo-oyQY

MobileTechCon 2016: https://medium.com/@icex33/beyond-

json-introduction-to-flatbuffers-fba1dfd0dcfe

AppBuilders 2017: https://www.youtube.com/watch?v=Ve9POqZJymw

BerlinBuzzWords 2017: https://www.youtube.com/watch?v=qF44UetsLsQ

Previous Talks

Protocol Buffers Encoding: https://developers.google.com/protocol-buffers/docs/encoding

FlatBuffers: https://google.github.io/flatbuffers/

FlatBuffersSwift: https://github.com/mzaks/FlatBuffersSwift

FlexBuffersSwift: https://github.com/mzaks/FlexBuffersSwift

Data Serialisation formats: https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats

JVM Serialisers: https://github.com/eishay/jvm-serializers

maxim zaks: deep dive into data serialisation

Software

serialisation in life science: time to harness the...

serialisation: where to begin - tracelink€¦ ·...

microprocessor interfacing techniques by lesea zaks

serialisation - international society of automation ·...

the serialisation of the pilgrims ofhope

distributed computing 9. sorting - a lower bound on bit...

missing identification resolution, highlighting...

programming the z-80 3rd edition (1980)(rodnay zaks)(sybex)

entitas system architecture with unity - maxim zaks and...

implementing serialisation - ipi international...

developing your serialisation strategy - key learnings

coding & serialisation - europa

distributed computing 3. leader election – lower bound for...

ispe indonesia event serialisation - farmasi industri

· 6502 applications rodnay zaks syb9( title __

serialisation - enterprise system...

serialisation - gs1 · serialisation within the...

generating serialisation code with clang euro-llvm...

distributed computing 8. impossibility of consensus shmuel...

the hidden challenges of pharmaceutical serialisation by...