microsoft cloud big data strategy

58
Microsoft Cloud Big Data Strategy James Serra Big Data Evangelist Microsoft [email protected]

Upload: james-serra

Post on 07-Feb-2017

490 views

Category:

Technology


2 download

TRANSCRIPT

Big Data FY17 Pitch Deck

Microsoft Cloud Big Data StrategyJames SerraBig Data [email protected]

Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsofts strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data,transforming it,storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution. 1

About MeMicrosoft, Big Data EvangelistIn IT for 30 years, worked on many BI and DW projectsWorked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW/APS developerBeen perm employee, contractor, consultant, business ownerPresenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conferenceCertifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data Platform SolutionsBlog at JamesSerra.comFormer SQL Server MVPAuthor of book Reporting with Microsoft SQL Server 2012

Fluff, but point is I bring real work experience to the session

2

AgendaBig data definedMicrosoft big data solutionAzure data lake

3

Big data defined

4

Big Data is changing traditional data warehousing data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system in IT is changing.

Gartner, The State of Data Warehousing** Donald Feinberg, Mark Beyer, Merv Adrian, Roxane Edjlali (Gartner), The State of Data Warehousing in 2012 (Stamford, CT.: Gartner, 2012)

Data sourcesOLTP

ERP

CRM

LOB

ETL

Data warehouse

BI and analyticsDashboardsReporting

Key goal of slide: To convey what every IT person knows: The data warehouse and whats it for. Then we set-up the Gartner quote to say that there is a tipping point. End the slide with a question: Why is it at a tipping point?Slide talk track:What is the traditional data warehouse?IT professionals know this well. A data warehouse or an enterprise data warehouse is a database that was designed specifically for data analysis. It is the single source of truth or the central repository for all data in the company. This means disparate data in the company coming from your transactional systems, your ERP, CRM or Line of Business applications would all be extracted, transformed, and cleansed and put into the warehouse. It was built so that the people who is accessing the warehouse using BI tools will be accessing data that has been provisioned by IT and represent accurate data sanctioned by the company.

However, this traditional data warehouse is reaching an inflection point. Gartner in their analysis of the state of data warehousing noted that it is reaching the most significant tipping point since its inception. The question is why? What is going on?5

Big Data has new data characteristics Data complexity: variety and velocityTerabytesGigabytesMegabytesPetabytes

Big Data(schema agility)

Log filesSpatial & GPS coordinatesData market feedseGov feedsWeather Text/imageClick streamWikis/blogsSensors/RFID/devicesSocial sentimentAudio/video

Web LogsDigital MarketingSearch MarketingRecommendationsAdvertisingMobileCollaborationeCommerceRelational(highly modeled schema)PayablesPayrollInventoryContactsDeal TrackingSales Pipeline

Big Data is driving transformative changesTraditionalBig DataRelational datawith highly modeled schemaAll datawith schema agility Specialized HWCommodity HWDatacharacteristics

Costs

Culture

Operational reportingFocus on rear-view analysisExperimentation leading to intelligent actionWith machine learning, graph, a/b testing

7

Big Data introduces new culture of experimentationSales and marketingFinance and riskCustomer and channelOperations

Understand customer patterns to uncover cross-sell opportunities

Engineering

Historical campaign effectivenessGenerate year-end financial reportsFinancial monitoring with real-time recommendations to increase revenueGenerate year-end financial reportsReal-time product offers and promotions based on behaviorCollect historical data on equipment performanceReal-time monitoring to identify proactive maintenance

Shipping features without understanding successBuilding successful features correlating user action with product experience

Action Decision

Interactive dashboardsWhy did it happen?

PredictionsWhat might happen?

RecommendationsWhat should I do? Decision automationDecision supportDataValue

Static reportsWhat happened?Manual processFrom data to decisions and actions

Data is now the key strategic business asset. Every device, every customer, every activity everything thats happening in the world around us - is producing incredibly rich data that can help us create new experiences, new efficiencies, new business models and even new inventions. Leveraging this data can be the differentiator for your business. For example, IDC estimates companies that are leaders in using data assets to their advantage will capture $1.6 trillion more in business value than those that lag behind. While data is pervasive, actionable intelligence from data is elusive. Our customers want to transform data to intelligent action and reinvent their business processes. To do this they need to more easily analyze massive amounts of data so they can move from seeing what happened and understanding why it happened to predicting what will happen and ultimately, knowing what should I do. Only then can they create the intelligent enterprise.Build 2015

2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.1/30/2017 10:23 AM9

However, there are challenges to Big Data

Obtaining skills and capabilitiesDetermining howto get valueIntegrating with existing IT investments

*Gartner: Survey Analysis Hadoop Adoption Drivers and Challenges (Stamford, CT.: Gartner, 2015)

10

But, Microsoft has done it beforeWe needed to better leverage data and analytics to do more experimentationSo we:Designed a data lake for everyone to put their data intoBuilt tools approachable by any developerCreated machine learning tools for collaborating across large experiment modelsResult:Across Microsoft, ten thousand developers doing experimentation leading to better insights Leading to growth in our Microsoft businesses:Office productivity revenue (45%YoY)*Intelligent Cloud (100% YoY)*Bing search share doubles

WindowsSMSGLiveBingCRM/DynamicsXbox LiveOffice365Malware ProtectionMicrosoft Stores Commerce RiskSkypeLCAExchangeYammerPetabytes Exabytes

* Microsoft. FY16 Q4 Results, URL: http://www.microsoft.com/en-us/Investor/earnings/FY-2016-Q4/press-release-webcast

Result:Used across Microsoft in Office, Xbox Live, Azure, Windows, Bing and SkypeSupports ten thousand developers running experimentationsManages exabytes of data

https://www.microsoft.com/en-us/Investor/earnings/FY-2016-Q4/press-release-webcast11

Microsoft is now taking everything weve learned on this journey

and bringing it to our customersTechnology. Cost. Culture.

Everything: technology, cost, culture12

Microsoft big data solution

13

Big Data as a cornerstone of Cortana IntelligenceAction

PeopleAutomated Systems

Apps

WebMobileBots

IntelligenceDashboards & VisualizationsCortana

Bot FrameworkCognitive Services

Power BI

Information ManagementEvent HubsData Catalog

Data Factory

Machine Learning and AnalyticsHDInsight (Hadoop and Spark)Stream AnalyticsIntelligenceData Lake AnalyticsMachine Learning

Big Data StoresData Lake StoreData SourcesAppsSensors and devicesData

SQL Data Warehouse

14

CONTROLEASE OF USEAzure Data Lake AnalyticsAzure Data Lake StoreAzure Storage

Any Hadoop technologyWorkload optimized, managed clustersSpecific apps in a multi-tenant form factorAzure MarketplaceHDP | CDH | MapRAzure Data Lake AnalyticsIaaS HadoopManaged HadoopBig Data as-a-serviceAzure HDInsightBIG DATA STORAGEBIG DATA ANALYTICS

Bringing Big Data to everybodyAccelerate the pace of innovation through a state-of-the-art cloud platform User Adoption

Microsoft Big Data Portfolio

SQL Server Stretch

Business intelligenceMachine learning analytics

Insights

Azure SQL Database

SQL Server 2016SQL Server 2016 Fast Track

Azure SQL DWAzure Data LakeDocumentDBHDInsight

HadoopAnalytics Platform System

Sequential

Scale Up

Scale Out + AcrossScale UpSequentialKeyRelationalNon-relational

Non-relationalRelational

On-premisesCloud

Cloud

On-premisesOn-premises

Microsoft has solutions covering and connecting all four quadrants thats why SQL Server is one of the most utilized databases in the world16

Our portfolio of products provides customers with the power to deploy the solution that suits their business needs.

Your choice of platform, whether on-premises, hybrid or private or public cloud, doesnt limit you now or in the future. Migrating or expanding becomes an easy process and doesnt require excessive downtime or introduce potential threats to your business success.

With Microsoft, you can seamlessly scale up to larger processing and storage capabilities, or scale out by adding additional servers in parallel arrangement.

T: SQL Server is a trusted market leader, and its the cornerstone of our data warehouse offering.

2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.1/30/2017Microsoft Analytics Platform System16

Azure HDInsightA Cloud Spark and Hadoop service for the EnterpriseReliable with an industry leading SLAEnterprise-grade security and monitoringProductive platform for developers and scientistsCost effective cloud scaleIntegration with leading ISV applicationsEasy for administrators to manage63% lower TCO than deploy your own Hadoop on-premises**IDC study The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight

Reliable Open Source analytics with an Industry leading SLAHDInsight allows you to easily spin up enterprise-grade open source cluster types guaranteed with the industrys best 99.9% SLA and 24/7 support. We guarantee this SLA for the entire big data solution, not just the VM instances. HDInsight is architected for full redundancy and high availability including head node replication, data geo-replication, and built-in standby NameNode making HDInsight resilient to critical failures not addressed in standard Hadoop implementations. Azure also offers cluster monitoring and 24x7 enterprise support backed by Microsoft and Hortonworks with 37 combined committers for Hadoop core, more than all other managed cloud providers combined to support your deployment and the ability to fix and commit code back to Hadoop.

Enterprise Grade Security & MonitoringHDInsight protects your data assets and easily extends your on-premise security and governance controls to the cloud. We feature single sign-on (SSO), multi-factor authentication and seamless management of millions of identities through Azure Active Directory. You can authorize users and groups with fine-grained access control policies over all your enterprise data with Apache Ranger. HDInsight meets HIPAA, PCI, SOC compliance, ensuring your enterprise data assets are always protected with the highest security and regulatory compliance. To ensure the highest level of business continuity, HDInsight extends capabilities for alerting, monitoring, defining pre-emptive actions, and enhanced workload protection through native integration with Azure Operations Management Suite (OMS). Most Productive platform for developers and scientists HDInsight offers developers tailored experiences through rich productivity suites for Hadoop & Spark with integrated development environments using Visual Studio, Eclipse, and IntelliJ supporting Scala, Python, R, Java, and .Net. HDInsight gives data scientists the ability to create narratives that combine code, statistical equations, and visualizations that tell a story about the data through integration to the two most popular notebooks: Jupyter and Zeppelin. HDInsight is also the only managed cloud Hadoop solution with integration to Microsoft R Server. Multi-threaded math libraries and transparent parallelization in R Server means handling up to 1000x more data and up to 50x faster speeds than open source Rhelping you train more accurate models for better predictions than previously possible.

Cost effective cloud scaleHDInsight has decoupled compute and storage, enabling you to cost-effectively scale workloads up or down, independent of storage. Local storage can still be used for caching and fast I/O. Spark and interactive Hive users can choose SSD memory for interactive performance; while Kafka users can retain all streaming data in premium managed disks. You only pay for the compute and storage you use and are given the ability to choose any Azure VM types that enables the best utilization of resources. A recent study showed HDInsight delivering 63% lower TCO than deploying Hadoop on premises over 5 years.*

Integration with leading Productivity ApplicationsIn the broader ecosystem for Hadoop, there is a thriving market of independent software vendors (ISVs) who provide value added solutions. Through a unique design where every cluster is extended with edge nodes and script action, HDInsight lets customers spin up Hadoop and Spark clusters pre-integrated and pre-tuned with any ISV application out-of-the-box. Datameer, Cask, AtScale, StreamSets are few such applications, which are very popular on the HDInsight platform today.

Easy for administrators to manageWith HDInsight, administrators can deploy Hadoop in the cloud without buying new hardware or incurring other up-front costs. Theres also no time-consuming installation or set up. There is also no need to patch the operating system or upgrade the Hadoop versions. Azure does it for you. Launch your first cluster in minutes.

17

Hortonworks Data Platform (HDP) 2.5Simply put, Hortonworks ties all the open source products together (22)

(under the covers of HDInsight)

the Parallel Data Warehouse Appliance 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.1/30/201718

Azure Data Lake StoreA No limits Data Lake that powers Big Data AnalyticsPetabyte size files and Trillions of objectsScalable throughput for massively parallel analyticsHDFS for the cloudAlways encrypted, role-based security & auditingEnterprise-grade support

Petabyte size files and Trillions of objects:With Azure Data Lake Store your organization can analyze all of its data in a single place with no artificial constraints. Your Data Lake Store can store trillions of files where a single file can be greater than a petabyte in size which is 200x larger than other cloud stores. This makes Data Lake Store ideal for storing any type of data including massive datasets like high-resolution video, genomic and seismic datasets, medical data, and data from a wide variety of industries.Scalable throughput for massively parallel analytics:Without redesigning your application or repartitioning your data at higher scale, Data Lake Store scales throughput to support any size of analytic workload. It provides massive throughput to run analytic jobs with 1,000+ concurrent executors that read and write hundreds of terabytes of data efficiently. HDFS for the Cloud:Microsoft Azure Data Lake Store supports any application that uses the open Apache Hadoop Distributed File System (HDFS) standard. By supporting HDFS, you can easily migrate your existing Hadoop and Spark data to the cloud without recreating your HDFS directory structure.Always encrypted, Role-based security & Auditing:Data Lake Store protects your data assets and extends your on-premises security and governance controls to the cloud easily. Data is always encrypted; in motion using SSL, and at rest using service or user managed HSM-backed keys in Azure Key Vault. Capabilities such as single sign-on (SSO), multi-factor authentication and seamless management of millions of identities is built-in through Azure Active Directory. You can authorize users and groups with fine-grained POSIX-based ACLs for all data in the Store enabling role-based access controls. Finally, you can meet security and regulatory compliance needs by auditing every access or configuration change to the system. Enterprise-grade Support:We guarantee a 99.9% enterprise-grade SLA and 24/7 support for your big data solution.

19

Azure Data Lake AnalyticsA No limits Analytics Job Service to power intelligent action

Start in seconds, scale instantly, pay per jobDevelop massively parallel programs with simplicityDebug and optimize your big data programs with easeVirtualize your analyticsEnterprise-grade security, auditing and support

Start in seconds, Scale instantly, Pay per job:Our on-demand service will have you processing Big Data jobs within 30 seconds. There is no infrastructure to worry about because there are no servers, VMs, or clusters to wait for, manage or tune. You can instantly scale the analytic units (processing power) from one to thousands for each job. You only pay for the processing used per job. Develop massively parallel programs with simplicity:U-SQL is a simple, expressive, and extensible language that allows you to write code once and automatically have it be parallelized for the scale you need. You can process petabytes of data for diverse workload categories such as ETL, machine learning, cognitive science, machine translation, imaging processing, and sentiment analysis by using U-SQL and leveraging existing libraries written in .NET languages, R, or Python..Debug and Optimize your Big Data programs with ease:Debugging failures in cloud distributed programs are now as easy as debugging a program in your personal environment. Our execution environment actively analyzes your programs as they run and offers recommendations to improve performance and reduce cost. For example, if you requested 1000 AUs for your program and only 50 AUs were needed, the system would recommend that you only use 50 AUs resulting in a 20x cost savings.Virtualize your analytics:The power to act on all your data with optimized data virtualization of your relational sources such as Azure SQL Server on VMs, Azure SQL Database, and Azure SQL Data Warehouse. Queries are automatically optimized by moving processing close to the source data, without data movement, thereby maximizing performance and minimizing latency.Enterprise-grade Security, Auditing and Support:Extend your on-premises security and governance controls to the cloud for meeting your security and regulatory compliance needs. Capabilities such as single sign-on (SSO), multi-factor authentication and seamless management of millions of identities is built-in through Azure Active Directory. Role Based Access control, and the ability to audit all processing and management operations are on by default. We guarantee a 99.9% enterprise-grade SLA and 24/7 support for your big data solution.

20

Azure Data LakeYARN

U-SQLAnalyticsHDInsightHive

R ServerHDFSStoreStore and analyze data of any kind and sizeDevelop faster, debug and optimize smarterInteractively explore patterns in your dataNo learning curveManaged and supported Dynamically scales to match your business prioritiesEnterprise-grade security Built on YARN, designed for the cloud

Azure SQL Data WarehouseA relational data warehouse-as-a-service, fully managed by Microsoft. Industries first elastic cloud data warehouse with enterprise-grade capabilities.Integrated with on-premises and cloud assets.Market leading price/performanceSimple compute & storage billingPay for what you needHigh performance without rewriting applicationsLow cost for latent dataInfrastructure, management and support provided

Scales to petabytes of data with MPP processingResize compute nodes < 1 minuteFaster time to insight than other SMP offeringDesigned for on-demand workload

Integrated with Azure platform and other Microsoft servicesEnables hybrid solutionsBuilt on SQL Server experience & technology

End-to-end platform builtfor the cloudElastic scale & performance

22

PolyBaseQuery relational and non-relational data with T-SQLCapabilityT-SQL for querying relational and non-relational data across SQL Server (APS, SQL Server 2016, SQL DW) and Hadoop and Azure blob storage (soon ADLS)BenefitsNew business insights across your data lakeLeverage existing skillsets and BI toolsFaster time to insights and simplified ETL process

By preview early this year PolyBase will support Teradata, Oracle, SQL Server, MongoDB, Hadoop and Azure blob storage

We are planning to release a preview of this functionality early next year as part of SQL Server V.Next CTPs, exact release dates are still in flux.By preview early next year PolyBase will support Teradata, Oracle, SQL Server, MongoDB, Hadoop and Azure blob storage (not MySQL!). We will continue to add more sources until GA.

http://demo.sqlmag.com/scaling-success-sql-server-2016/integrating-big-data-and-sql-server-2016

When it comes to key BI investments we are making it much easier to manage relational and non-relational data with Polybase technology that allows you to query Hadoop data and SQL Server relational data through single T-SQL query. One of the challenges we see with Hadoop is there are not enough people out there with Hadoop and Map Reduce skillset and this technology simplifies the skillset needed to manage Hadoop data. This can also work across your on-premises environment or SQL Server running in Azure.Server & Tools Business 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.1/30/201723

Windows Azure

Comparison of IoT Hub and Event Hubs: https://azure.microsoft.com/en-us/documentation/articles/iot-hub-compare-event-hubs/

2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.1/30/2017 10:23 AM24

Azure Stream AnalyticsProcess real-time data in AzureConsumes millions of real-time events from Event Hub collected from devices, sensors, infrastructure, and applications

Performs time-sensitive analysis using SQL-like language against multiple real-time streams and reference data

Outputs to persistent stores, dashboards or back to devices

Point of Service Devices

Self CheckoutStations

Kiosks

Smart Phones

Slates/Tablets

PCs/Laptops

Servers

Digital Signs

DiagnosticEquipment

Remote Medical Monitors

LogicControllers

SpecializedDevices

Thin Clients

Handhelds

Security

POS Terminals

AutomationDevices

VendingMachines

Kinect

ATM

Stream Analytics

Microsoft ConfidentialTransform the datacenterUnlock insights on any dataEmpower people-centric ITEnable modern business appsAzure Stream Analytics is a cost effective event processing engine that helps uncover real-time insights from devices, sensors, infrastructure, applications, and data. It will enable various opportunities including Internet of Things (IoT) scenarios such as real-time fleet management or gaining insights from devices like mobile phones and connected cars. Deployed in the Azure cloud, Stream Analytics has elastic scale where resources are efficiently allocated and paid for as requested. Developers are given a rapid development experience where they describe their desired transformations in SQL and the system abstracts the complexities of the parallelization, distributed computing, and error handling from them.

Looking forward into H2 FY15, Stream Analytics will become generally available after previewing at TechEd EMEA 2014.25

Azure Machine LearningGet started with just a browser Requires no provisioning; simply log on to your Azure subscription or try it for free off azure.com/ml Experience the power of choice Choose from hundreds of algorithms and packages from R and Python or drop in your own custom code Take advantage of business-tested algorithms from Xbox and BingDeploy solutions in minutes With the click of a button, deploy the finished model as a web service that can connect to any data, anywhereConnect to the world Brand and monetize solutions on our global Machine Learning Marketplace https://datamarket.azure.com/Beyond business intelligence machine intelligence

Microsoft Azure Machine Learning StudioModeling environment (shown) Microsoft Azure Machine Learning API service Model in production as a web serviceMicrosoft AzureMachine Learning MarketplaceAPIs and solutions for broad use

Microsoft ConfidentialMicrosofts Big Data vision in the cloud is to enable organizations to solve large, complex problems end-to-end, from storing and managing TBs of data without investing in hardware and software, to seamless integration with the 1 billion users of Excel. As part of this vision, Microsoft offers Azure Machine Learning, designed to democratize the complex task of advanced analytics.

Advanced analytics is using products like Azure Machine Learning to find new and actionable insights that traditional approaches to business intelligence are unlikely to discover. An easy way to think about this is thinking about a dashboard. Today when confined by only BI tools without a connection to machine learning, it is solely the job of the human looking at the spreadsheet to gain insights and react to the data. But a human can only consume so many variables. A computer, on the other hand, can consume a great deal more variables to provide much deeper insight on the data. Humans can then react to the data to make decisions that drive competitive advantage, as well as program the computer further to recognize important patterns in the future. This is why we say beyond business intelligence machine intelligence.

The accessibility of our solution starts with set up. Previously you needed to provision your workspace on-premises for machine learning, also thinking about server space and a host of other considerations. Today you can get started with just a browser. With only an Azure subscription, you can take advantage of the full functionality of Azure Machine Learning within minutes. Taking a test drive is even easier, click Get Started off azure.com/ml and with simply a Microsoft ID youre off to the races.

Another limit with other machine learning solutions are siloed environments that only allow for one programming language or make changing from one algorithm to another time consuming and complex. With Azure ML, you can experience the power of choice. That choice expands to language, with both Python and R being first class citizens of Azure ML, or algorithm. You can choose from hundreds of algorithms, including business-tested ones running our Microsoft businesses today. And swapping out algorithms to land on the right one for you is done with a click. Additionally you can drop in custom R and Python code your special sauce and mix and match that with the other options in the tool.

Most revolutionary of all you can deploy solutions in minutes as a web service, which is simply a url which can connect to any data, anywhere including on-premises or in another cloud environment. The ability to put a model into production almost immediately, as well as revise it easily, is unique to Microsoft and allows companies to stay on top of the changing business landscape more effectively than is offered by any other provider today.

We even take that a step further, allowing model developers to connect to the world with our Machine Learning Marketplace, where they can publish finished solutions and APIs with their own brand and business model. Developers can also discover machine learning solutions there without any machine learning skills needed the data science is inside. Check it out at https://datamarket.azure.com/.

26

Azure Data CatalogEnable enterprise-wide self-service data source registration and discoveryA metadata repository that allow users to register, enrich, understand, discover, and consume data sources

Delivers differentiated value thoughData source discovery; rather than data discovery Support for data from any source; Structured and unstructured, on premises and in the cloudPublishing, discovery and consumption through any tool Annotation crowdsourcing: empowering any user to capture and share their knowledge.

This, while allowing IT to maintain control and oversight

Azure Data FactoryConnect to relational or non-relational data that is on-premises or in the cloud

Orchestrate data movement & data processing

Publish to Power BI users as a searchable data view

Operationalize (schedule, manage, debug) workflows

Lifecycle management, monitoringOrchestrate trusted information production in AzureMicrosoft Confidential Under Strict NDA

No SQL

DB

Blob

C#

MapReduce

Trusted dataBI & analytics

HivePigStored Procedures

VMAzure Machine Learning

Microsoft ConfidentialTransform the datacenterUnlock insights on any dataEmpower people-centric ITEnable modern business appsAzure Data Factory is a cloud service for creating, managing, and monitoring the production of trusted information from on-premises and cloud data sources using transformative analytics at scale. Data Factory can be used in solutions to gain insights from operational and service health telemetry data, analyze customer actions to determine an optimal targeted marketing strategy, or predict customer churn from customer profile and service log data. Instead of writing hard-to-manage custom code to wire together a data warehouse with Hadoop, NoSQL, and SaaS, use Data Factory to quickly create and deploy highly available data processing pipelines, significantly cutting your time to solution and your operational costs. Get a single monitoring view of all of your data processing pipelines along with data lineage and service health. Bring together on-premises data like SQL Server and cloud data like Azure SQL Database, Blobs, and Tables with the transformative analytics of HDInsight (Hive, Pig, MapReduce, custom .NET code), and even Azure Machine Learning, to produce trusted information that is easily consumed by BI tools or applications.

Looking forward into H2 FY15, Data Factory will become generally available after previewing at TechEd EMEA 2014.

28

Discovery & exploration integrated experience for connecting and preparing data for visual data explorationEasy report authoring freeform canvas for drag-and-drop report designCustom visualizationscreate your own custom interactive visualizationsR integration - extend your reports with advanced analytics through support for R

Create powerful reports with Power BI desktop

Power BI Desktop is a self-service BI tool designed to allow users to pull data together from multiple different data sources. Transform and clean that data. Model and add custom calculations. And then visually explore and create interactive reports that can be easily published and shared through the Power BI service.

In addition you can now create your own custom visualizations though our open source visualization framework. More information available at powerbi.com/visuals

29

Delivery

Partner apps

Office 365

Dynamics

Azure data servicesMicrosoft cloud data

On-premises data

SQL Server Analysis ServicesDatabases andother data sources

Power BISQL Server Reporting Services

Power BI DesktopAnalysis/authoring SQL Server Mobile Report PublisherSQL Server Report Builder/Report DesignerExcelOn-premisesCloud

Non-Microsoft cloud data

Apps

Mobile

Power BI Service

Microsoft business intelligence & analytics

Microsoft Ignite 2015 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.1/30/2017 10:23 AM30

Live dashboards provide a 360 view of your businessTrack your data in real-time with support for streaming dataDrill through to underlying reports to explore in more detailPin new visualizations and KPIs to monitor performance

146.03K145.84K145.96K146.06K40.08K38.84K39.99K40.33K

Live dashboards & reports via Power BI Service

Power BI dashboards With updates to Power BI customers can now see all their data through a single pane of glass. Live Power BI dashboards show visualizations and KPIs from data that reside both on-premises and in the cloud, providing a consolidated view across their business regardless of where their data lives.

You can then explore their data further by drilling through the dashboard into the underlying reports, discovering new insights that they can pin back to the dashboard to monitor performance going forward. Server & Tools Business 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.1/30/201731

Hey Cortana, show me my sales opportunity

Natural language queryask questions of your data more naturallyCortana integrationallows you to access your data from Windows 10Quick insightsauto discover patterns and insights in your data

Experience your data in new ways

Natural Language Interface - With Power BI we continue to find new ways to simplify how people analyze and gain insight from data, providing industry leading features such as natural language query. Natural language query provides users with an easier way to interact with their data, allowing them to type questions of their data and receive answers in the form of live visualizations. Power BI integration with Cortana allows you to now ask these question directly from Cortana and to have answers from your Power BI data surfaced to you by Cortana.These data driven answers can range from simple numeric values (revenue for the last quarter), charts (revenue over time), maps (revenue by region) or data represented through any of the other Power BI data visualizations.Combined with the Cortana Analytics suite, this opens up amazing new opportunities to use Cortana to enable your business, and your customers' businesses, to get things done in more helpful, proactive, and natural ways. Quick Insights - providing a new ways to help users find hidden insights in their data. The new Quick Insights feature allows users to automatically scan and detect patterns and trends in the data that they publish to Power BI. Through a partnership with Microsoft Research, the Quick Insights feature uses a growing list of algorithms to automatically discover and visualize correlations, outliers, trends, seasonality, change points in trends, and other factors in your data in seconds.

32

www.botframework.com

BlueR:0 G:120 B:215CyanR:0 G:188 B:242Light GrayR:210 G:210 B:210Dark BlueR:0 G:32 B:80Dark GrayR:80 G:80 B:80GrayR:115 G:115 B:115PurpleR:92 G:45 B:145OrangeR:216 G:59 B:1GreenR:16 G:124 B:16Main colorsSecondary colors (use only when necessary)Animation set to loop (replace /Build walk in ?), Add session id to top

Bot Framework provides everything you need to build and connect intelligent bots that interact naturally wherever your users are talking, from text/sms to Skype, Slack, Office 365 mail and other popular services.

Bot Framework consists of three main components: Bot Connector, Bot Builder, and Bot Directory

33

Microsoft Cognitive ServicesGive your apps a human sideVisionComputer Vision | Emotion | Face | VideoSpeechBing Speech | Custom Recognition | Speaker RecognitionKnowledgeAcademic Knowledge | Entity Linking | Knowledge Exploration | RecommendationsLanguageBing Spell Check | Language Understanding Linguistic Analysis | Text Analytics | Translator | Web Language ModelSearchBing Autosuggest | Bing Image Search | Bing News Search | Bing Video Search | Bing Web SearchCognitive Services API Collection

At Microsoft, weve been offering APIs for a very long time across the company. In delivering Microsoft Cognitive Services API, we started with 4 last year at /build (2015); added 7 more last December, and today (May 2016) we have 22 APIs in our collection. Cognitive Services are available individually or as a part of the Cortana Intelligence Suite, formerly known as Cortana Analytics, which provides a comprehensive collection of services powered by cutting-edge research into machine learning, perception, analytics and social bots.

These APIs are powered by Microsoft Azure.

Developers and businesses can use this suite of services and tools to create apps that learn about our world and interact with people and customers in personalized, intelligent ways.

Microsoft Build 2016 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.1/30/2017 10:23 AM34

Azure Analysis ServicesAzure Analysis Services is based on the proven analytics engine that has helped organizations turn complex data into a trusted, single source of truth for years.

Built for hybrid dataAccess and model data on-premises, in the cloud, or bothInteractive visualizationQuick, highly interactive self-service data discovery with support of major data visualization toolsNote: not all capabilities available at public previewProven technologyPowerful, proven tabular models built from SQL Server 2016 Analysis Services

Cloud poweredEasy to deploy, scale, and manage as a platform-as-a-service solution

Key points: Summarize key benefits for Azure Analysis Services

Talk track:As already mentioned, Azure Analysis Services is based on the proven analytics engine in SQL Server 2016 Analysis Services, that has helped organizations turn complex data into a trusted, single source of truth for years.This means that BI professionals who are familiar with SQL Server Analysis Services, tabular models can get started quickly and do not need to learn new tools or skills. And with the power of the cloud, BI professionals do not need to manage infrastructure on-premises. They can easily deploy the BI solution and benefit from the scalability of the cloud.Organizations store data in the cloud and on-premises. Azure Analysis Services is built for hybrid data. Data can be access in the cloud, on-premises or a combination of both, enabling a hybrid solution. So - customers do not have to move on-premises data to the cloud. And last but not least. Azure Analysis Services enables interactive data visualization over billions of rows of data and as it supports BI industry standards such as XML/A and MDX, business users can access data using their preferred data visualization tool. Whether it is Power BI, Excel or other major data visualization tools. To summarize, Azure Analysis Services is simple to use it is easy to get started, you can use your existing skills to create BI semantic models, and your favorite data visualizations tools to analyze your data.

35

Microsoft R portfolioSQL ServerR ServicesLinuxHadoopTeradata

WindowsMicrosoft R portfolioCommercialCommunityR ServerR Open

Slide objective Show broad commitment to R by preserving freely available, enhanced editions, Windows and SQL Server editions and R Server editions for leading EDWs, Linux and Hadoop platforms.Differentiate free, open editions from commercial by mentioning availability of commercial 24x7 support, and enhancements to support very large scale data analytics at speed.

Talking points

Notes

2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.1/30/2017 10:23 AM36

Future investmentsRecent capabilitiesQuery and TransactionsScaleRich, SQL-based query languageServer side JavaScript transactional supportAutomatic, write optimized indexGeo-replicationEncryption at restBackup/restoreSecurity & Disaster RecoveryAPIs and ProgrammabilityPerformanceReserved and flexible performance levelsDynamic index policyTunable consistency levelsSupport for larger JSON documents Support for scaling accounts to TBs in sizeAdditional region availability 6 currentlyName and ID based resource routingSharding frameworkAzure Search Indexer and Hadoop ConnectorFully managed database service built on a native JSON data modelApplication controlled schema with massive scale-out enables iterative development and evolving data modelsAutomatic indexing enables robust querying over schema-free dataIntegrated transactional JavaScript processing + tunable consistency enable high performance application experiencesHighly-scalable NoSQL document database-as-a-service which enables query over schema-free data and multi-document transaction processingAzure DocumentDB

Microsoft ConfidentialTransform the datacenterUnlock insights on any dataEmpower people-centric ITEnable modern business appsMicrosoft Azure DocumentDB is the highly-scalable NoSQL document database-as-a-service that enables query over schema-free data and multi-document transaction processinghelps deliver configurable and reliable performanceand enables rapid development

DocumentDB is the right solution for applications that run in the cloud when predictable throughput, low latency, and flexible query are key. Fully managed PaaS database service backed by the power of Microsoft Azure. Unlike many other NoSQL offers, DocumentDB was built for the cloud to perform and scale in a multi-tenant environment. Cluster administration, replication, and other management functions are handled for the customer automatically. DocumentDB is backed by a 99.95% availability SLA (at GA) to provide consistent, reliable performance.Application controlled schema with massive scale-out enables iterative development and evolving data models. DocumentDB supports a schema-free data model where the application defines the data model. This supports modern application development scenarios where applications are developed iteratively with many versions supported concurrently and data models continuously evolve.Automatic indexing enables robust querying over schema-free data. DocumentDB is the first of its kind to offer SQL over schema-free JSON data and multi-document transactional processing.Integrated transactional JavaScript processing + tunable consistency enable high performance application experiences. DocumentDB supports stored procedures, triggers, and user-defined functions. It also supports tunable consistency with well-defined click stops to enable developers to tune database performance based on the applications needs.

The key scenarios for DocumentDB are the following:Emitting telemetry and logging dataStoring/querying event and workflow dataPersisting device and app configuration dataUser generated contentScalable, iterative app development 37

Microsoft + Open Source Momentum

SQL Server on Linux (Preview today, GA in mid-2017)

Red Hat - Microsoft Partnership (Nov 2015)

Microsoft joins Eclipse Foundation (Mar 2016).

HD Insight PaaS on Linux GA (Sep 2015)Run Linux on Windows natively(March 2016)Windows Subsystem for Linux

C:\Users\markhill>root@localhost: #bash

Azure Marketplace60% of all images in Azure Marketplace are based on Linux/OSSIn partnership with the Linux Foundation, Microsoft releases the Microsoft Certified Solutions Associate (MCSA) Linux on Azure certification.

493,141,677??????Microsoft Open Source HubRoss Gardler: President Apache Software FoundationWim Coekaerts: Oracles Mr LinuxMicrosoft Employees

1 out of 4 31 out of 4 VMs on Azure runs Linux, and getting larger every day28.9% of All VMs are Linux >50% of new VMs

38

Azure data lake

39

Azure Data LakeBig Data made easy

Analytics on any data, any sizeEasier and more productive for all usersEnterprise-ready

40

Azure Data LakeBig Data made easy

Analytics on any data, any sizeEasier and more productive for all usersEnterprise-ready

41

Petabyte size files and Trillions of objects

Store data in its native formatPB sized files, 200x larger than anyone elseScalable throughput for massively parallel analyticsNo need to redesign application or reparation data at higher scale

TBsEBsStore

Petabyte size files and Trillions of objects:With Azure Data Lake Store your organization can analyze all of its data in a single place with no artificial constraints. Your Data Lake Store can store trillions of files where a single file can be greater than a petabyte in size which is 200x larger than other cloud stores. This makes Data Lake Store ideal for storing any type of data including massive datasets like high-resolution video, genomic and seismic datasets, medical data, and data from a wide variety of industries.

Scalable throughput for massively parallel analytics:Without redesigning your application or repartitioning your data at higher scale, Data Lake Store scales throughput to support any size of analytic workload. It provides massive throughput to run analytic jobs with 1,000+ concurrent executors that read and write hundreds of terabytes 42

Any type of analyticsBatch, interactive, streaming, machine learningAllows for exploratory analytics over dataAnalyze with Hadoop and Microsoft solutionsCortana Intelligence Suite

YARN

U-SQLAnalyticsHDInsightHDFSStore

Hive

R Server

Start in seconds, Scale instantly, Pay per job with AnalyticsProcess big data jobs in 30 secondsNo infrastructure to worry about (no servers, no VMs, no clusters)Instantly scale analytic units up or down (processing power)Architected for cloud scale and performanceFrees you up to focus only on your business logic

Start in seconds, Scale instantly, Pay per job:Our on-demand service will have you processing Big Data jobs within 30 seconds. There is no infrastructure to worry about because there are no servers, VMs, or clusters to wait for, manage or tune. You can instantly scale the analytic units (processing power) from one to thousands for each job. You only pay for the processing used per job.

44

Azure Data LakeBig Data made easyAnalytics on any data, any sizeEasier and more productive for all usersEnterprise-ready

45

Easy for administrators to spin up quicklyDeploy big data projects in minutesNo hardware to install, tune, configure or deployNo infrastructure or software to manageScale to tens to thousands of machines instantly

Debug and Optimize your Big Data programs with easeDeep integration with Visual Studio, Visual Studio Code, Eclipse, & IntelliJEasy for novices to write simple queriesIntegrated with U-SQL, Hive, Storm, and SparkActively offers recommendations to improve performance and reduce costPlayback visually displays job run

Debug and Optimize your Big Data programs with ease:Debugging failures in cloud distributed programs are now as easy as debugging a program in your personal environment. Our execution environment actively analyzes your programs as they run and offers recommendations to improve performance and reduce cost. For example, if you requested 1000 AUs for your program and only 50 AUs were needed, the system would recommend that you only use 50 AUs resulting in a 20x cost savings.

47

Develop massively parallel programs with simplicityU-SQL: a simple and powerful language thats familiar and easily extensibleUnifies the declarative nature of SQL with expressive power of C#Leverage existing libraries in .NET languages, R and PythonMassively parallelize code on diverse workloads (ETL, ML, image tagging, facial detection)

Develop massively parallel programs with simplicity:U-SQL is a simple, expressive, and extensible language that allows you to write code once and automatically have it be parallelized for the scale you need. You can process petabytes of data for diverse workload categories such as ETL, machine learning, cognitive science, machine translation, imaging processing, and sentiment analysis by using U-SQL and leveraging existing libraries written in .NET languages, R, or Python..

48

Query data where it livesEasily query data in multiple Azure data stores without moving it to a single store

BenefitsAvoid moving large amounts of data across the network between stores (federated query/logical data warehouse)Single view of data irrespective of physical location Minimize data proliferation issues caused by maintaining multiple copiesSingle query language for all dataEach data store maintains its own sovereigntyDesign choices based on the needPush SQL expressions to remote SQL sourcesFiltersJoinsU-SQL QueryQueryQueryQueryWrite

Azure Storage Blobs

Azure SQL in VMs

Azure SQL DB

Azure Data Lake AnalyticsQuery

Azure SQL Data WarehouseQueryWriteAzure Data Lake Storage

49

Easy for data scientists with familiar R languageR Server for HDInsightLargest portable R parallel analytics libraryTerabyte-scale machine learning1,000x larger than in open source R Up to 100x faster performance using Spark and optimized vector/math librariesEnterprise-grade security and support

*Applies to HDInsight only

With Microsoft Azure HDInsight, Microsoft R Server is now available as an option when you create HDInsight clusters in Azure. This new capability provides data scientists, statisticians, and R programmers with on-demand access to scalable, distributed methods of analytics on HDInsight.

Clusters can be sized to the projects and tasks at hand and torn down when they're no longer needed. Since they're part of Azure HDInsight, these clusters come with enterprise-level 24/7 support, an SLA of 99.9% uptime, and the flexibility to integrate with other components in the Azure ecosystem.

R Server on HDInsight provides the latest capabilities for R-based analytics on datasets of virtually any size loaded to either Azure Blob or Data Lake storage. Since R Server is built on open source R, the R-based applications you build can leverage any of the 8000+ open source R packages, as well as the routines in ScaleR, Microsofts big data analytics package that's included with R Server.

The edge node of a cluster provides a convenient place to connect to the cluster and to run your R scripts. With an edge node, you have the option of running ScaleRs parallelized distributed functions across the cores of the edge node server. You also have the option to run them across the nodes of the cluster by using ScaleRs Hadoop Map Reduce or Spark compute contexts.

The models or predictions that result from analyses can be downloaded for use on-premises. They can also be operationalized elsewhere in Azure, such as through anAzure Machine Learning Studioweb service.

50

Azure Data LakeBig Data made easyAnalytics on any data, any sizeEasier and more productive for all usersEnterprise-ready

51

Highest availability guarantee in the industry for peace of mindManaged, monitored and supported by MicrosoftEnterprise-leading SLA99.9% uptimeNo IT resources needed for upgrades and patchingMicrosoft monitors your deployment so you dont have to99.9% SLA

Azure Regions

38 Regions Worldwide, 32 Generally Available100+ datacentersTop 3 networks in the world2.5x AWS, 7x Google DC RegionsG Series Largest VM in World, 32 cores, 448GB Ram, SSD

https://azure.microsoft.com/en-us/regions/

53

Always encrypted, Role-based security & AuditingAlways encrypted; in motion using SSL, and at rest using keys in Azure Key VaultSingle sign-on, multi-factor authentication and seamless integration of on-premises identities with Active DirectoryFine-grained POSIX-based ACLs for role-based access controlsAuditing every access / configuration change

Always encrypted, Role-based security & Auditing:Data Lake Store protects your data assets and extends your on-premises security and governance controls to the cloud easily. Data is always encrypted; in motion using SSL, and at rest using service or user managed HSM-backed keys in Azure Key Vault. Capabilities such as single sign-on (SSO), multi-factor authentication and seamless management of millions of identities is built-in through Azure Active Directory. You can authorize users and groups with fine-grained POSIX-based ACLs for all data in the Store enabling role-based access controls. Finally, you can meet security and regulatory compliance needs by auditing every access or configuration change to the system.

54

Lower total cost of ownershipNo hardware Hadoop support included with Azure support Pay only for what you useIndependently scale storage and computeNo need to hire specialized operations team 63% lower total cost of ownership than on-premises**IDC study The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight

Cloud Big Data Solution

1) Copy source data into the Azure Data Lake Store (twitter dataexample)2) Massage/filter the data using Hadoop (or skip using Hadoop and use stored procedures in SQL DW/DB to massage data after step #5)3) Pass data into Azure ML to build models using Hive query (or pass in directly from Azure Data Lake Store)4) Azure ML feeds prediction results into the data warehouse5) Non-relational data in Azure Data Lake Store copied to data warehouse in relational format (optionally use PolyBase with external tables to avoid copying data)6) Power BI pulls data from data warehouse to build dashboards and reports7) Azure Data Catalogcaptures metadata from Azure Data Lake Store and SQL DW/DB8) Power BI and Excel can pull data from the Azure Data Lake Store via HDInsight9) To support high concurrency if using SQL DW, or for easier end-user data layer, create an SSAS cube56

Recognized by top analystsForrester Wave for Big Data Hadoop CloudNamed industry leader by Forrester with the most comprehensive, scalable, and integrated platforms*Recognized for its cloud-first strategy that is paying off**The Forrester WaveTM: Big Data Hadoop Cloud Solutions, Q2 2016.

Get it at //aka.ms/forresterwave

57

Q & A

?James Serra, Big Data EvangelistEmail me at: [email protected] me at: @JamesSerra Link to me at: www.linkedin.com/in/JamesSerra Visit my blog at: JamesSerra.com (where this slide deck is posted under the Presentations tab)