sam babad @ midlink | [email protected] | 052-595-8885 · big data is not just a buzzword. big data is...
TRANSCRIPT
Big Data is not just a buzzword. Big Data is something more and more organizations have
Commonly defined as a combination of three:
Volume, Velocity, Variety
Or just too much data to process with reasonable HW
Today most consider > 15TB Big Data
For real time > 5TB is considered Big Data
The internet entities are managing PBs of data
Most organization have outgrown their expectation
For Fast Data 50TB are reasonable
For Big Data 2-5PB are reasonable
This is all on commodity hardware
The approach is to collect as much data as you can and figure out what questions you have later!
Visualization (Exploring, Analyzing, Reporting)
Logical (DWH, ETL, Business Rules, Actions)
Data collection File & OS
Physical (Network, Servers, Storage or Cloud)
ValueData
Ing
est
(K
afk
a/V
olt
DB
)
Distributed storage (Hadoop + Sql/NoSql)
InsightProcessFormatBatch
• In memory
• Stream processing
Real-time processing
Visualization (SiSense/Tableu/Looker/ZoomData) & Monitoring
Inexpensive startup
Ability to grow/shrink according to need
Requires less expensive infra-structure skills
Elastic usage for tasks like upgrade
Amazon
Cisco
Azure (Microsoft)
Fog – smaller local clouds
BIGDATA THE BIGPICTURE (PARTIAL)
No SQL
Columnar
Hadoop
In-Memory
ApplianceOLTP
Cloud
Visualization
Sharding
IBM
EMC
HP
Oracle
Yahoo
Amazon
SAP
Microsoft
Actian
Datastax
EnterpriseDB
Apache
Tableau
Datameer
Looker
No!
Hadoop (open source) is becoming the de-facto standard
Postgre (open source) and it’s derivatives is becoming the SQL home
Visualization on top of these with both new & old tools
All available in the cloud