big data, why care
Post on 27-Jan-2015
121 Views
Preview:
DESCRIPTION
TRANSCRIPT
BigData, Why Care?
Saturday 20 October 12
Datacrunchers Consultancy Services
Speaker
Daan Gerits- BigData Architect- DataCrunchers.eu
§Semantic Analysis, Data Harvesting, ...§Hadoop, Azure, BigInsights, ...§Storm
BigData.be co-organizer
2
Saturday 20 October 12
Datacrunchers Consultancy Services
BigData
A lot of technical fuzz- Hadoop, Storm, Pig, ...
Seems to be only for the big players- Google, Facebook, Linkedin, Twitter, ...
So why should ‘we’ care?- we = Startups, Smaller and Medium Enterprises (SSME)
3
Saturday 20 October 12
Datacrunchers Consultancy Services
What BigData Promises
Ability to store and process large amounts of data- Scalable in hardware and software- Scalable in budget
Which means your budget can grow with your data- start small with a small cluster
- the more data you want to manage, the more systems you add
Lower cost systems- Several low to medium end systems- instead of 1 big expensive one
4
Saturday 20 October 12
Datacrunchers Consultancy Services
But what can you do with it?
Analyze your data with higher precisionAnalyze historical factsPrevent Data Loss- Infrastructure failure
- Human errors
Eliminate data silo’s
5
Saturday 20 October 12
Datacrunchers Consultancy Services
High Precision Analysis
Traditional Technologies- Problems:
§Unable to store all data
- Solutions:§Sharding§Aggregate data
- Problems:§Sharding has a high maintanance cost§Sharding is complex for users and apps§Manual sharding adds a high risk§Data Aggregation causes loss in data precision
6
Saturday 20 October 12
Datacrunchers Consultancy Services
High Precision Analysis
BigData allows us to- Store and process large amounts of data
§So no need to aggregate
- ‘Forget’ about sharding§BigData technologies do this for you§Makes it predictable§And transparant
But- You have to configure it correctly
- You don’t have ad-hoc querying (yet)
7
Saturday 20 October 12
Datacrunchers Consultancy Services
Analyze Historical Facts
Data Warehouse- Built on top of parameters
What if we forget to add a parameter?- Add the parameter
- Start gathering information for that parameter
Problem:- We will only have information from the moment we add
the parameter!
8
Saturday 20 October 12
Datacrunchers Consultancy Services
Analyze Historical Facts
Let’s store everythingDetermine the parameters later- by humans- by machine learning algorithms
Analysis will process all dataWhat if we forget to add a parameter?- add the parameter
- regenerate your reports
9
Saturday 20 October 12
Datacrunchers Consultancy Services
Analyze Historical Data
Conclusion- Traditionally: Ask first, store later- BigData: store first, ask later
10
Saturday 20 October 12
Datacrunchers Consultancy Services
Prevent Data Loss
Traditional technologies- Machine Failure
§ I hope you have a backup from yesterday?
- Human Error §Whoops I deleted those records§ I hope you have a backup from yesterday?
- So in the worst case, you lose one day of data
11
Saturday 20 October 12
Datacrunchers Consultancy Services
Prevent Data Loss
BigData allows us to- Survive machine failure without data-loss- Survive human error without data-loss
But- You need a data-model which supports this
§ Incremental model
- You need to restrict operations§Only append data, No updates or deletes
12
Saturday 20 October 12
Datacrunchers Consultancy Services
Prevent Data Loss
Conclusion- Traditional technologies
§ requires very advanced setups to handle machine failure§allow you to go back to yesterday’s state
- BigData § requires knowledge of how the failover algorithms work§expects failure most of the time§allows you to go back to the previous state
13
Saturday 20 October 12
Datacrunchers Consultancy Services
Eliminate Data Silo’s
Departments having their own data sources- start to modify that data- start to treat it as their master data
- not coupled to the master dataset
Causes a lot of overhead- Silo’s miss master data updates- Business decisions based on silo data, not the more
accurate master data
No obvious way out
14
Saturday 20 October 12
Datacrunchers Consultancy Services
Eliminate Data Silo’s
Consolidate the silo’s- Identify the silo’s- Import the data from the silo’s into one store
- Reconstruct master data based on silo rules and priorities
15
MasterData
Sa
M
SuSupport
Marketing
Sales
Saturday 20 October 12
Datacrunchers Consultancy Services
Eliminate Data Silo’s
Generate read-only data-models per applicationData changes are sent to the master data- using a specific api- using database triggers
16
DataWarehouse
Public API
ERP/CRM DBM1
M2
M3
MasterData
Saturday 20 October 12
Datacrunchers Consultancy Services
Eliminate Data Silo’s
Conclusion- You will have to consolidate- But you need a structural solution
- Which can be provided by BigData
- In a flexible and future-proof way
17
Saturday 20 October 12
Datacrunchers Consultancy Services
Conclusion
There is a lot to think aboutBut BigData can do a lot of things- A lot more than I explained today
For a reasonable priceAnd you are not alone- bigdata.be- datacrunchers.eu
18
Saturday 20 October 12
Questions?
Saturday 20 October 12
top related