is the elephant in the room

Download Is the elephant in the room

If you can't read please download the document

Upload: regunath-balasubramanian

Post on 16-Apr-2017

2.064 views

Category:

Technology


0 download

TRANSCRIPT

Is the Elephant in the room?

Regunath B

[email protected]
Twitter : @RegunathB

Quick read 1.8 million words?

The story is about a battle between great kings and sons, with the principal characters being Arjuna, Pandu, Bhishma, Bharata, Karna, Duryodhana, Yudhishthira etc.

Source : The Gramener blog for visualizations
Analysis of the entire text contained in the Mahabharatha
(http://blog.gramener.com/category/visualisations)

Insights from Social Media

Source : ttwick Billionaires page (Bill Gates' Twitter Social Media profile)
(http://ttwick.com/blog/bill-gates-twitter-social-media/)

Insights from Social Media

Source : Impact page of Satyamevjayate
(http://www.satyamevjayate.in/impact/impact.php/)

What is Big Data?

Big Data challenges and opportunities arise when information in an enterprise demonstrates following characteristics:

VolumeTransaction data from enterprise systemsFor example : Financial transactions, Orders

VarietyStructured and Unstructured dataFor example : Customer contact, Social Media, Biometrics

VelocityHigh information arrival ratesFor example : Application events, Tagging, Rating of content

Big Data opportunities arise when the enterprise is able to derive Value from the data characteristics defined above

Food for thought.... on theorems and laws

Do hardware and technology trends affect your technology selection?CPU, RAM and disk size double every 18-24 months [Moores law]

Disk seek time remains nearly constant at around 5% speed-up per year

Data Seek vs. Data transferSoftware that leverage one of the above (or) a combinationB+ tree index, LSM tree index, Fractal tree

CAP theorem effect ability to achieve only 2 of 3 properties of shared-data systems : data Consistency, system Availability and tolerance to network Partitions

Bandwidth is the most scare commodity in a Data Center

Aadhaar Patterns & Technologies

Principles

POJO based application implementation

Light-weight, custom application container

Http gateway for APIs

Compute PatternsData Locality

Distribute compute (within a OS process and across)

Compute ArchitecturesSEDA Staged Event Driven Architecture

Master-Worker(s) Compute Grid

Data Access typesHigh throughput streaming : bio-dedupe, analytics

High volume, moderate latency : workflow, UID records

High volume , low latency : auth, demo-dedupe,

search eAadhaar, KYC

Aadhaar Architecture

Work distribution using SEDA & Messaging

Ability to scale within JVM and across

Recovery through check-pointing

Sync Http based Auth gateway

Protocol Buffers & XML payloads

Sharded clusters

Near Real-time data delivery to warehouse

Nightly data-sets used to build dashboards, data marts and reports

Real-time monitoring using Events

Putting data to work at Aadhaar

Deployment Monitoring

Big Data at Flipkart

Website trafficMillions of page hits per day product catalogs, item availability, promotions, search

Millions of active sessions and shopping carts

Latencies measured in low digit milliseconds

Growing list of categories (Books, Mobiles, Toys, Personal,Home,Baby, Digital music...) Electronic inventory MP3, eBooks, movies

New business models, newer channels

Understanding users, user profiles, social media, experienceTera bytes of logs containing browsing behavior, data from multiple engagement channels

Recommendations based on millions of possible item matches and relevance algorithms

Is the Elephant in the room?

From Wikipedia:

"Elephant in the room" is an English metaphorical idiom for an obvious truth that is being ignored or goes unaddressed.

Big Data opportunities and challenges are real and present -
It is the Elephant in the room.

Some takeaways from experience

Make everything API based

Everything fails (hardware, software, network, storage)System must recover, retry transactions, and sort of self-heal

Security and privacy should not be an afterthought

Scalability does not come from one productWatch out for solution and technology stereotyping

Open scale out is the only way to goHeterogeneous, multi-vendor, commodity compute, growing linear fashion. Nothing else can adapt!

Click to edit the title text formatClick to edit Master title style

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline Level

Ninth Outline LevelClick to edit Master text styles

Second level

Third level

Fourth level

Fifth level

14/08/12