big data processing: past, present and future · 2017-04-05 · big data processing: past, present...

21
Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic LLC. VTSP – Microsoft Corp. [email protected] [email protected] @OrionGM

Upload: others

Post on 23-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

Big Data Processing: Past, Present and Future

Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic LLC. VTSP – Microsoft Corp.

[email protected] [email protected]

@OrionGM

Page 2: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

Big Data Processing: Past, Present and Future

Page 3: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

•  History and Fundamentals of Big Data Processing •  SQL Server for Big Data, Past, Present and Future •  Summary

Topics Covered

Page 4: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

Characteristics of Big Data

Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your

database architectures. To gain value from this data, you must choose an alternative way to process it.

Page 5: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

Characteristics of Big Data

The Vs of Big Data •  Volume

•  40 Zettabytes (43 Trillion Gigabytes) of data will be created by 2020. 300 Times increase from 2005

•  Most companies in the U.S have at least 100Tb of data •  Velocity

•  NYSE captures 1TB of trade information every day •  The average modern car has over 100 sensors

•  Variety •  Nearly 420 Million wearable health monitors •  Over 4 Billion hours of video watched on YouTube everyday

Page 6: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

Page 7: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

History of Big Data

A big data cluster is a highly interconnected platform built from a collection of commodity parts.

*Disruptive Possibilities by Jeffrey Needham Copyright © 2013

Page 8: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

Scale Up vs. Scale Out

Scale up (SMP) Scale out (MPP)

+(n)

Upgrade components or buy bigger server each time Add nodes to the cluster

Multiprocessor system where processors share resources : •  Operating System (OS),

•  memory,

•  I/O devices

and connected using a common bus

Multiple processors, each processor using its own OS and memory and communicating with each other using some form of messaging interface

Page 9: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

Notable milestones in Commodity hardware CDC 6600 by Control Data Corporation. "The 6600 CPU had multiple functional units which could operate simultaneously (i.e., in parallel), allowing the CPU to overlap instructions' execution times“.. http://en.wikipedia.org/wiki/CDC_6600

A Beowulf cluster (1990s) is a computer cluster of what are normally identical, commodity-grade computers networked into a small local area network with libraries and programs installed which allow processing to be shared among them. http://en.wikipedia.org/wiki/Beowulf_cluster

Page 10: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

Some Applications of Big Data

Big Data supercomputers are pattern explorers. •  Shopping Patterns •  Sensor and Intelligent devices Data analytics •  Social Network associations and suggestions •  Predictive analytics •  Crime investigation

Page 11: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

Page 12: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

SQL Server for Big Data

Page 13: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

SQL Server Optimizations

Page 14: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

About Analytics Platform System

! 

! 

! 

! 

! 

! 

! 

! 

SQL Server Parallel Data

Warehouse

Microsoft HDInsight

PolyBase

Microsoft Analytics Platform System

Page 15: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

APS Growth Topology

Base Unit Scale Unit Extension Base Unit

Page 16: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

Introducing the Microsoft Analytics Platform System

•  Relational and non-relational data in a single appliance

•  Enterprise-ready Hadoop

•  Integrated querying across Hadoop and PDW using T-SQL

•  Direct integration with Microsoft BI tools such as Microsoft Excel

•  Near real-time performance with In-Memory Columnstore

•  Ability to scale out to accommodate growing data

•  Removal of data warehouse bottlenecks with MPP SQL Server

•  Concurrency that fuels rapid adoption

•  Industry’s lowest data warehouse appliance price per terabyte

•  Value through a single appliance solution

•  Value with flexible hardware options using commodity hardware

Page 17: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

Deployment options and hybrid solutions

Page 18: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

Provides a single T-SQL query model for PDW and Hadoop with rich features of T-SQL, including joins without ETL

Uses the power of MPP to enhance query execution performance

Supports Windows Azure HDInsight to enable new hybrid cloud scenarios

Provides the ability to query non-Microsoft Hadoop distributions, such as Hortonworks and Cloudera

SQL Server Parallel Data

Warehouse Microsoft Azure HDInsight

PolyBase

Microsoft HDInsight

Hortonworks for Windows and Linux

Cloudera

Connecting islands of data with PolyBase

Result set

Select…

Page 19: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

Microsoft’s modern data warehouse

Data Platform

Analytics Platform System

SQL Server 2014

Microsoft Azure HDInsight

Page 20: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

Summary

•  Understand your data growth to determine when to “Scale-Out”.

•  Determine the right tool for the workload you have.

Page 21: Big Data Processing: Past, Present and Future · 2017-04-05 · Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic

© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.

Questions and Discussion

Questions?