avero mapper java

15
Apache Avro Zafar Gilani Muhammad Adnan Khan Hui Shang

Upload: ramesh-kumar-samy

Post on 20-Dec-2015

22 views

Category:

Documents


1 download

DESCRIPTION

Java based avero mapper desc

TRANSCRIPT

Apache Avro

Zafar Gilani

Muhammad Adnan Khan

Hui Shang

Outline

• Overview

• Comparison

• Specification

• SASL profile and usage

• References

Overview

• A data serialization system.

• An RPC framework.

• For: storage & comm.

• Purpose:

– Provide rich data structures.

– A compact and fast binary data format.

– Simple integration with dynamic languages.

Overview

• Avro uses JSON for Interface Description Language (IDL).

– To specify data types.

– To specify protocols.

• Review: JavaScript Object Notation is just a light-weight text-based standard for data interchange.

Why the need for Avro?

• Primary usage in Hadoop, provides standard:

1. Serialization format for persistent data.

2. Wire format for communication ..

• .. among Hadoop nodes.

• .. from client programs to Hadoop services.

Overview

• Avro relies on schemas.

– Schema stored with data.

– Each datum written with no per-value overheads.

• Thus serialization is fast and small.

• Avro in RPC:

– Schema exchange during client-server handshake.

– Correspondence in fields can be easily resolved.

APIs

• Supporting API for:

– Java

– C

– C++

– C#

– Python

– Ruby

Comparison with other systems

• Avro vs. Protobuf and Thrift.

• A quick note about Thrift:

– Initially developed at Facebook by a Google intern.

– Closer to Google’s protobuf.

Comparison with other systems

Avro Google protobuf Thrift

Implementation Hmm.. Cleaner Hmm..

Error handling Complex Simple OK

Extensibility Hmm.. Richer OK

Compatibility Java, C, C++, C#, Python and Ruby

That and much more such as Adobe Actionscript, Microsoft Silverlight, etc.

About the same as protobuf

Specification

• Schema represented in one of: – JSON string, naming a defined type.

– JSON object of the form: • {"type": "typeName" ...attributes...}

– JSON array

• Primitive types: null, boolean, int, long, float, double, bytes, string – {"type": "string"}

• Complex types: records, enums, arrays, maps, unions, fixed

Specification, example protocol

{

"namespace": "com.acme",

"protocol": "HelloWorld",

"doc": "Protocol Greetings",

"types": [

{"name": "Greeting", "type": "record", "fields": [

{"name": "message", "type": "string"}]},

{"name": "Curse", "type": "error", "fields": [

{"name": "message", "type": "string"}]}

],

"messages": {

"hello": {

"doc": "Say hello.",

"request": [{"name": "greeting", "type": "Greeting" }],

"response": "Greeting",

"errors": ["Curse"]

}

}

}

SASL profile

• Simple Authentication and Security Layer.

• Provides a framework for

– Authentication.

– Security of network protocols.

SASL usage

• Negotiation procedure to use connection-oriented Avro RPC:

– 0: START Used in a client's initial message.

– 1: CONTINUE Used while negotiation is ongoing.

– 2: FAIL Terminates negotiation unsuccessfully.

– 3: COMPLETE Terminates negotiation sucessfully.

References

1. Apache Avro, http://avro.apache.org/docs/current/

2. Google protocol buffers vs Apache Avro, http://www.sammur.com/?p=36

3. Avro vs Thrift, http://tech.puredanger.com/2011/05/27/serialization-comparison/

4. SASL, http://avro.apache.org/docs/current/sasl.html

Apache Avro

Zafar Gilani

Muhammad Adnan Khan

Hui Shang