an introduction to apache gora
TRANSCRIPT
Apache Gora
What is it ?
Gora Nutch
Supports
Data Access
API's
Apache Gora What is it ?
Provides for Big Data
In memory data model
Persistence
Data store abstraction
Supports persisting to
Column stores
Key/value stores
Document stores
RDBMS's
Supports use of Hadoop
Apache Gora What is it ?
Released via Apache 2 license
Written in Java
Offers a persistence framework
Designed for big data applications
Used by Nutch 2.x for web crawl data storage
Used for
Persistence
Indexing
Analytics
Apache Gora Nutch
Nutch 2.x now uses Gora
Abstracted storage
Data store independence
Handles object to persistent mappings
Use various NoSql solutions
Apache Gora Supports
Gora supports the following
Apache Accumulo
Apache Cassandra
Apache Hbase
Amazon DynamoDB
Pig
Hive
Cascading
MapReduce
Apache Gora Data Access
Java API for data access
Independent of location
Core Gora API's
Store
Persistency
Query
MapReduce
Apache Gora Store API
Java API org.apache.gora.store.*
DataStore handles object persistence
DataStore methods process objects
Persist
Fetch
Query
Delete
Apache Gora Persistency API
Java API org.apache.gora.persistency.*
Core classes
BeanFactory
Construct keys
Persistent
Persist objects
State
State managed through StateManager
NEW, CLEAN (UNMODIFIED)
DIRTY (MODIFIED), DELETED
Apache Gora Query API
Java API org.apache.gora.query.*
Core classes
Query
Constructed via DataStore
PartitionQuery
Divide results of Query into partitions.
Run queries on data nodes.
Generate Hadoop InputSplits
Result
Apache Gora MapReduce API
Java API org.apache.gora.mapreduce.*
GoraMapper
GoraReducer
ALL Record Counter
Reader
Writer
Hadoop / Avro
Serialise
De-serialise
Persistent
Contact Us
Feel free to contact us at
www.semtech-solutions.co.nz
We offer IT project consultancy
We are happy to hear about your problems
You can just pay for those hours that you need
To solve your problems