data acquisition - big-project.eubig-project.eu/sites/default/files/data acquisition webinar...

16
Data Acquisition Axel Ngonga Lead Data Acquisition BIG Data PPF http://big-project.eu

Upload: others

Post on 03-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability

Data Acquisition

Axel NgongaLead Data AcquisitionBIG Data PPFhttp://big-project.eu

Page 2: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability

Motivation

● Increasing amout of data○ 4K new pictures on Instagram○ 100K tweets○ 800K new pieces of content on Facebook○ …

Page 3: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability

Motivation

Page 4: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability

Motivation

● Big data technologies for ○ Improved business intelligence○ Secure decisions○ Customized services○ …

● Use Cases○ Mission planning○ Trade market○ Customized services○ Criminality prediction○ ...

Page 5: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability

Definition

● Data acquisition stands for ○ Selecting of data sources○ Collection of information from these sources ○ Filtering and cleaning data

Page 6: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability

Overview

DS

DS

DS

DS

Processing(cleaning,

classification)Storage

Page 7: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability

More than 3 Vs

● The 9(?) Vs of Big Data Acquisition○ Volume○ Velocity○ Variety○ Vocabulary○ Variability (security models, ownership)○ Veracity (trustworthiness of data)○ Visibility (integrated view of data)○ Value (worth of data for data consumer)○ Visualization

Page 8: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability

Requirements

● Extensibility of protocols● High scalability of approaches● Low memory consumption● Parallelism● Elasticity● Fast ROI● High throughput (real-time)

Page 9: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability

Technology Overview

● Gathering○ Advanced Message Queuing Protocol

■ Wire-level protocol■ OASIS Standard since Oct. 2012■ Large number of implementations incl.

RabbitMQ, SwiftMQ, Apache ActiveMQ, Windows Azure Service Bus

○ JMS 2.0○ Kestrel (Memcached)○ Apache Kafka○ Apache Flume (log data)○ FB Scribe (log data)

Page 10: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability

Technology Overview

● Processing○ Facebook Scribe (Aggregation)○ Twitter Storm (Stream Data Processing, Analysis)○ MOA (Massive Online Analysis, esp. classification)○ Hadoop (Distributed Processing)○ InfoSphere Streams (Analysis)

Page 11: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability

Technology Overview

● Storage○ MongoDB (BSON)○ Apache CouchDB (JSON)○ Neo4J (Graph DB)○ Oracle NoSQL○ IBM DB2 NoSQL

● Holistic Frameworks○ Oracle's Big Data Suite○ IBM's Big Data Suite○ Karmasphere

Page 12: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability

Tool Matrix

Page 13: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability

Simple Recipe

1. Which of the 9Vs are important for me?2. What are my sources?

○ Protocols○ Velocity○ Type of data (logs, XML, …)○ ...

3. What’s my current storage architecture?○ NoSQL?○ Distributed?

Page 14: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability

Thank You!Questions?

Axel NgongaUniversity of Leipzig

AKSW Research [email protected]

http://aksw.org/AxelNgongahttp://big-project.eu

Page 15: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability
Page 16: Data Acquisition - big-project.eubig-project.eu/sites/default/files/Data Acquisition Webinar Slides.pdf · The 9(?) Vs of Big Data Acquisition Volume Velocity Variety Vocabulary Variability

Questionnaire