data acquisition - big-project.eubig-project.eu/sites/default/files/data acquisition webinar...
TRANSCRIPT
Data Acquisition
Axel NgongaLead Data AcquisitionBIG Data PPFhttp://big-project.eu
Motivation
● Increasing amout of data○ 4K new pictures on Instagram○ 100K tweets○ 800K new pieces of content on Facebook○ …
Motivation
Motivation
● Big data technologies for ○ Improved business intelligence○ Secure decisions○ Customized services○ …
● Use Cases○ Mission planning○ Trade market○ Customized services○ Criminality prediction○ ...
Definition
● Data acquisition stands for ○ Selecting of data sources○ Collection of information from these sources ○ Filtering and cleaning data
Overview
DS
DS
DS
DS
Processing(cleaning,
classification)Storage
More than 3 Vs
● The 9(?) Vs of Big Data Acquisition○ Volume○ Velocity○ Variety○ Vocabulary○ Variability (security models, ownership)○ Veracity (trustworthiness of data)○ Visibility (integrated view of data)○ Value (worth of data for data consumer)○ Visualization
Requirements
● Extensibility of protocols● High scalability of approaches● Low memory consumption● Parallelism● Elasticity● Fast ROI● High throughput (real-time)
Technology Overview
● Gathering○ Advanced Message Queuing Protocol
■ Wire-level protocol■ OASIS Standard since Oct. 2012■ Large number of implementations incl.
RabbitMQ, SwiftMQ, Apache ActiveMQ, Windows Azure Service Bus
○ JMS 2.0○ Kestrel (Memcached)○ Apache Kafka○ Apache Flume (log data)○ FB Scribe (log data)
Technology Overview
● Processing○ Facebook Scribe (Aggregation)○ Twitter Storm (Stream Data Processing, Analysis)○ MOA (Massive Online Analysis, esp. classification)○ Hadoop (Distributed Processing)○ InfoSphere Streams (Analysis)
Technology Overview
● Storage○ MongoDB (BSON)○ Apache CouchDB (JSON)○ Neo4J (Graph DB)○ Oracle NoSQL○ IBM DB2 NoSQL
● Holistic Frameworks○ Oracle's Big Data Suite○ IBM's Big Data Suite○ Karmasphere
Tool Matrix
Simple Recipe
1. Which of the 9Vs are important for me?2. What are my sources?
○ Protocols○ Velocity○ Type of data (logs, XML, …)○ ...
3. What’s my current storage architecture?○ NoSQL?○ Distributed?
Thank You!Questions?
Axel NgongaUniversity of Leipzig
AKSW Research [email protected]
http://aksw.org/AxelNgongahttp://big-project.eu
Questionnaire