how to use big data
TRANSCRIPT
![Page 1: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/1.jpg)
Digicomp 1
Kursleitung:
Die Microsoft BI Plattform in der Cloud
Matthias Gessenay, 20. Januar 2016 / [email protected]
![Page 2: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/2.jpg)
2Digicomp
Copyrights
Folien z.T. entnommen aus dem Azure Readiness Slidedeck von Microsoft (https://github.com/Azure-Readiness/CloudDataCamp/blob/master/Presentation/HDInsight/Hadoop%20in%20Azure.pptx)
Folien z.T. entnommen aus der MS Ignite Session PowerBI Overview (http://www.google.ch/url?sa=t&rct=j&q=&esrc=s&source=web&cd=8&cad=rja&uact=8&ved=0ahUKEwiH3pygp7XKAhVBVRoKHQ9KCJwQFghcMAc&url=http%3A%2F%2Fvideo.ch9.ms%2Fsessions%2Fignite%2F2015%2Fdecks%2FBRK2556_Doyle.pptx&usg=AFQjCNHOr7Kb8pJEFnLKHvAMUho0AOBhjA)
![Page 3: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/3.jpg)
Digicomp 3
Einführung in Apache Hadoop
![Page 4: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/4.jpg)
4Digicomp
Apache Hadoop
![Page 5: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/5.jpg)
6Digicomp
Data volume
Hadoop speichert Dateien in einem verteilten Dateisystem
Verteilt über viele Server
Dateien können über viele Knoten verteilt werden
Hadoop kann sehr grosse Datenmengen speichern
Skalierbar von einigen zu vielen tausend Knoten
Dateien können grösser sein als die Kapazität eines einzelnen Knotens
![Page 6: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/6.jpg)
7Digicomp
Data variety
Hadoop speichert Dateien in einem nicht-relationalen Format
![Page 7: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/7.jpg)
CalibriDigicomp
Hadoop vs. SQL
RelationalDatabase
SCALE (storage & processing)
HadoopPlatform
schema
speed
governance
best fit use
processing
Required on write Required on read
Reads are fast Writes are fast
Standards and structured Loosely structured
Limited, no data processing Processing coupled with data
data typesStructured Multi and unstructured
Interactive OLAP Analytics
Complex ACID Transactions
Operational Data Store
Data Discovery
Processing unstructured data
Massive Storage/Processing
![Page 8: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/8.jpg)
CalibriDigicomp
YARN: Next Generation Hadoop (Azure DataLake ist auf Yarn gebaut)
Single Use System
Batch Apps
Multi Use Data Platform
Batch, Interactive, Online, Streaming, …
1st Gen of Hadoop
HDFS(redundant, reliable storage)
MapReduce(cluster resource management
& data processing)
Redundant, Reliable Storage(HDFS)
Efficient Cluster Resource Management & Shared Services
(YARN)
Flexible DataProcessing
Hive, Pig, others…
BatchMapReduce
Batch & InteractiveTez
Online Data Processing
HBase, Accumulo
Stream Processing
Storm
others…
2nd Gen of Hadoop
Classic Hadoop
Apps
![Page 9: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/9.jpg)
CalibriDigicomp
http://hortonworks.com/blog/introducing-apache-hadoop-yarn/
Hadoop 2.0: Yarn
![Page 10: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/10.jpg)
11Digicomp
Datenknoten
Verteilt
Lokaler Speicher
Fehlertolerant (3 Kopien per Block)
Splittet Dateien in Blöcke
Namensknoten
Speichert keine Daten
Weiss aber, wo welche Blöcke liegen
HDFS: Hadoop Storage
![Page 11: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/11.jpg)
CalibriDigicomp
Hadoop MapReduce
………
Do work() Do work() Do work()
![Page 12: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/12.jpg)
Digicomp 13
Apache Hadoop in Azure
![Page 13: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/13.jpg)
14Digicomp
HDInsight: What’s Different?
Nicht so viel …
HDP on Windows
HDP on Linux
Compute und Storage sind verteilt
Azure Blob Storage
![Page 14: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/14.jpg)
CalibriDigicomp
HDInsight Storage Infrastructure
HDInsight Compute Nodes (Large VMs)
Azure Blob Storage
Azure Flat Network Storage
Stream datato compute
Push databack to storage
map sort shuffle reduce
http://dennyglee.com/2013/03/18/why-use-blob-storage-with-hdinsight-on-azure/
![Page 15: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/15.jpg)
16Digicomp
HDInsight Demo
![Page 16: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/16.jpg)
17Digicomp
Microsoft Self Service-BI
![Page 17: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/17.jpg)
CalibriDigicomp
Mächtige Self-Service BI mit Excel 2013
![Page 18: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/18.jpg)
19Digicomp
Suited for self-service data that fits in Excel
Data driven shaping – design while you drive
Ideal for sampling data
Partition data in Hadoop/Hive based on user workloads
No governors to prevent users from pulling «too much data»
Does not read compressed or binary files (yet)
Power Query
![Page 19: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/19.jpg)
22Digicomp
Demo - HDInsight
![Page 20: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/20.jpg)
23Digicomp
Azure Data Lake
Basierend auf Apache YARN
Praktisch unbegrenzte Datenmengen / Rechenpower
Zahlung nach Nutzung
Aktuell noch auf Einladung
Neue Sprache: U-SQL
![Page 21: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/21.jpg)
CalibriDigicomp
Demo
![Page 22: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/22.jpg)
25Digicomp
PowerBI
Cloud Dashboards
On Premise-Technologie verfügbar (DataZen)
Datenanbindung via PowerBI sehr einfach
Hybrid möglich
![Page 23: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/23.jpg)
CalibriDigicomp
Demo
![Page 24: How to use Big Data](https://reader033.vdocuments.us/reader033/viewer/2022051708/58866b721a28ab7d408b5ab7/html5/thumbnails/24.jpg)
CalibriDigicomp
Fragen?