austin cherian: big data and hpc technologies - intel

38
Big Data and HPC technologies 24 July, 2014 Austin Cherian Sr. Applications Engineer

Upload: vu-hung-nguyen

Post on 19-Jun-2015

286 views

Category:

Technology


5 download

DESCRIPTION

Intel Xeon Architecture Roadmap • Big Data Trends • Intel in Big Data • The Big Data HPC connection

TRANSCRIPT

Page 1: Austin Cherian: Big data and HPC technologies - intel

Intel Confidential — Do Not Forward Intel Information Technology

Big Data and HPC technologies 24 July, 2014

Austin Cherian

Sr. Applications Engineer

Page 2: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

*Other names and brands may be claimed as the property of others. INTEL CONFIDENTIAL

Legal Disclaimer - Notice INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR

OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families: Go to: http://www.intel.com/products/processor_number

Intel® AES-NI requires a computer system with an AES-NI enabled processor, as well as non-Intel software to execute the instructions in the correct sequence. AES-NI is available on select Intel® processors. For availability, consult your reseller or system manufacturer. For more information, see http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni/

No computer system can provide absolute security under all conditions. Intel® Trusted Execution Technology (Intel® TXT) requires a computer with Intel® Virtualization Technology, an Intel TXT-enabled processor, chipset, BIOS, Authenticated Code Modules and an Intel TXT-compatible measured launched environment (MLE). Intel TXT also requires the system to contain a TPM v1.s. For more information, visit http://www.intel.com/technology/security

Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, and virtual machine monitor (VMM). Functionality, performance or other benefits will vary depending on hardware and software configurations. Software applications may not be compatible with all operating systems. Consult your PC manufacturer. For more information, visit http://www.intel.com/go/virtualization

Requires a system with Intel® Turbo Boost Technology. Intel Turbo Boost Technology and Intel Turbo Boost Technology 2.0 are only available on select Intel® processors. Consult your PC manufacturer. Performance varies depending on hardware, software, and system configuration. For more information, visit http://www.intel.com/go/turbo

Copyright © 2014 Intel Corporation. All rights reserved. Intel, Intel Xeon, the Intel Xeon logo and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

*Other names and brands may be claimed as the property of others.

2

Page 3: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

*Other names and brands may be claimed as the property of others. INTEL CONFIDENTIAL

Legal Disclaimers - Performance Software and workloads used in performance tests may have been optimized for performance only on Intel

microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase.

Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platform into each of the specific benchmark results of each of the other platforms, and assigning them a relative performance number that correlates with the performance improvements reported.

SPEC, SPECint, SPECfp, SPECrate, SPECpower_ssj, SPECjAppServer, SPECjEnterprise, SPECjbb, SPECompM, SPECompL, and SPEC MPI are trademarks of the Standard Performance Evaluation Corporation. See http://www.spec.org for more information.

TPC Benchmark is a trademark of the Transaction Processing Council. See http://www.tpc.org for more information.

SAP and SAP NetWeaver are the registered trademarks of SAP AG in Germany and in several other countries. See http://www.sap.com/benchmark for more information.

3

Page 4: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

*Other names and brands may be claimed as the property of others. INTEL CONFIDENTIAL

Optimization Notice

Optimization Notice

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for

optimizations that are not unique to Intel microprocessors. These optimizations include SSE2®, SSE3,

and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability,

functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.

Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.

Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please

refer to the applicable product User and Reference Guides for more information regarding the specific

instruction sets covered by this notice.

Notice revision #20110804

4

Page 5: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

*Other names and brands may be claimed as the property of others. INTEL CONFIDENTIAL

Legal Disclaimers All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to

change without notice. Romley, Ivy Bridge, Sandy Bridge, Westmere, Nehalem, Harpertown, and certain other names are code names used to identify

unreleased Intel products. Intel makes no warranty of trademark non-infringement, and use of these code names by third parties is at their own risk.

Intel may make changes to specifications, product descriptions, and plans at any time, without notice. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED,

BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.

Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.

This document contains information on products in the design phase of development. The information here is subject to change without notice. Do not finalize a design with this information.

The Intel® Xeon® Processor may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel, the Intel logo, Intel® Virtualization Technology, Intel® I/O Acceleration Technology, Intel® VTune™ Analyzer, Intel® Thread Checker™, Intel® Tools, Intel® Trace Analyzer and Collector and Intel® Xeon™ are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

Hyper-Threading Technology requires a computer system with a processor supporting HT Technology and an HT Technology-enabled chipset, BIOS and operating system. Performance will vary depending on the specific hardware and software you use. For more information including details on which processors support HT Technology, see here

“Intel® Turbo Boost Technology requires a PC with a processor with Intel Turbo Boost Technology capability. Intel Turbo Boost Technology performance varies depending on hardware, software and overall system configuration. Check with your PC manufacturer on whether your system delivers Intel Turbo Boost Technology. For more information, see http://www.intel.com/technology/turboboost.”

Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor (VMM) and, for some uses, certain computer system software enabled for it. Functionality, performance or other benefits will vary depending on hardware and software configurations and may require a BIOS update. Software applications may not be compatible with all operating systems. Please check with your application vendor.

*Other names and brands may be claimed as the property of others.

5

Page 6: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

*Other names and brands may be claimed as the property of others. INTEL CONFIDENTIAL

• Intel Xeon Architecture Roadmap

• Big Data Trends

• Intel in Big Data

• The Big Data HPC connection

Agenda

Page 7: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

*Other names and brands may be claimed as the property of others. INTEL CONFIDENTIAL

• Intel Xeon Architecture Roadmap

• Big Data Trends

• Intel in Big Data

• The Big Data HPC connection

Agenda

Page 8: Austin Cherian: Big data and HPC technologies - intel

Intel Confidential Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands

may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice. Copyright © 2013, Intel Corporation.

Intel Tick Tock Model – Inspiring Confidence

Typically, Increases in Transistor Density Enables New Capabilities, Higher Performance Levels, and Greater Energy Efficiency

Haswell Sandy Bridge

Ivy Bridge

Nehalem

Westmere

32nm 22nm 45nm

Nehalem Microarchitecture

Sandy Bridge Microarchitecture

Haswell Microarchitecture

TICK

TOCK

8

Page 9: Austin Cherian: Big data and HPC technologies - intel

1,14 1,27 1,28 1,30 1,35 1,41

1,9

0,00

0,20

0,40

0,60

0,80

1,00

1,20

1,40

1,60

1,80

2,00

E5-2697 v2 Baseline

STREAM (Triad)

SPECfp*_rate_ base2006

SPECjbb* 2013

MultiJVM

SPECint*_rate_ base2006

Brokerage OLTP

Warehouse OLTP

Linpack

Prel

imin

ary

Rel

ativ

e Pe

rfor

man

ce

Intel® Xeon® E5-26xx v3 (14C, 2.7GHz, 145W) vs. Intel® Xeon® E5-2697 v2 (12C, 2.7GHz, 130W)

Intel® Xeon® Processor E5-2600 v3 Product Family Preliminary Performance Expectations

Source: Intel internal estimates as of 17 Nov 2013. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance *Other names and brands may be claimed as the property of others.

Up to 37% performance boost on average expected over previous Xeon® generation

9 Intel Confidential

Max jOPs

Page 10: Austin Cherian: Big data and HPC technologies - intel

Intel Confidential – NDA Required

The Intel® Xeon Phi™ Coprocessors: Formerly code named the Knights corner

10

Page 11: Austin Cherian: Big data and HPC technologies - intel

Intel Confidential – NDA Required

Intel® Xeon® Processors + Intel® Xeon Phi™ Coprocessors: Complimentary Solutions for Parallel Workloads

Leadership performance for the majority of server & workstation workloads

Versatile foundation to meet rapid growth in users, devices, and data

Robust energy efficiency, security, and reliability to reduce data center costs

Advanced performance for highly parallel workloads for breakthrough innovation and discovery

Based on Intel® MIC Architecture; Works synergistically with Intel® Xeon® Processors

Increased developer productivity via programming models & tools common with Intel® Xeon® Processors

Develop with Intel tools for Intel® Xeon Processor today, Scale your software investment to include Intel® Xeon Phi™ Products

11

Page 12: Austin Cherian: Big data and HPC technologies - intel

Intel Confidential Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands

may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice. Copyright © 2013, Intel Corporation.

22 nm process Coprocessor

Over 1 TF DP Peak

Up to 61 Cores Up to 16GB GDDR5

Available Today Knights Corner Intel® Xeon Phi™ x100 Product Family

2H’15* Knights Landing Intel® Xeon Phi™ x200 Product Family

Future TBA 3rd generation

14 nm process

Server Processor & Coprocessor

Over 3 TF DP Peak1

60+ cores

Up to 400GB Memory

~500 GB/s sustained mem bandwidth

In planning

* First commercial systems All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice. 1 Over 3 Teraflops of peak theoretical double-precision performance is preliminary and based on current expecations of cores, clock frequency and floating point operations per cycle. FLOPS = cores x clock frequency x floating-point operations per second per cycle.

Knights Landing

Knights Landing with Fabric

Intel® Xeon Phi™ Product Family Path to Performance and Programmability

Page 13: Austin Cherian: Big data and HPC technologies - intel

Weather Research and Forecasting (WRF) Conus 2.5 km

13

Application: Weather Research and Forecasting (WRF)

Availabilty: WRF V3.5 was released 4/18/13 https://software.intel.com/en-us/articles/how-to-get-wrf-running-on-the-

intelr-xeon-phitm-coprocessor

Code Optimization: Approximately two dozen files with less than 2,000 lines of code were

modified (out of approximately 700,000 lines of code in about 800 files, all Fortran standard compliant)

Most modifications improved performance for both the host and the co-processors

Performance Measurements: V3.5 and NCAR supported CONUS2.5KM benchmark (a high resolution weather forecast)

Acknowledgments: There were many contributors to these results, including the National

Renewable Energy Laboratory and The Weather Channel Companies

SOURCE: INTEL MEASURED RESULTS AS OF NOVEMBER, 2013

1,00

1,56

0,00 0,20 0,40 0,60 0,80 1,00 1,20 1,40 1,60 1,80

Speedup (Higher is Better)

2S Intel® Xeon® processor E5-2670

2S Intel® Xeon® processor E5-2670 + Intel® Xeon Phi™ coprocessor (pre-production HW/SW)

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. See benchmark tests and configurations in the speaker notes. For more information go to http://www.intel.com/performance

Page 14: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

*Other names and brands may be claimed as the property of others. INTEL CONFIDENTIAL

• Intel Xeon Architecture Roadmap

• Big Data Trends

• Intel in Big Data

• The Big Data HPC connection

Agenda

Page 15: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

Virtuous Cycle of Data-Driven Innovation

2.8 Zettabytes of data will be generated WW in 20121

Richer user experiences

Richer data from devices

40 Zettabytes of data will be generated WW in 20201

Richer data to analyze

Cloud

Clients

Intelligent Systems

(1) IDC Digital Universe 2020, (2) IDC

15

Page 16: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

HPC Enabling exascale computing on massive data sets

Cloud Helping enterprises build open interoperable clouds

Open Source Contributing code and fostering ecosystem

Forces Driving Big Data Advancement

Intel®

TrueScale Infiniband

* Other names and brands may be claimed as the property of others.

16

Page 17: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

Enterprise and Big Data How is it used in different sectors?

US health care $300 billion value per year ~0.7 percent annual productivity growth

Europe public sector administration €250 billion value per year ~0.5 percent annual productivity growth

Global personal location data $100 billion+ revenue for service providers Up to $700 billion value to end users

US retail 60+% increase in net margin possible 0.5-1.0 percent annual productivity growth

Manufacturing Up to 50 percent decrease in product development, assembly costs Up to 7 percent reduction in working capital

FMCG: Problem solving ideas, product decisions, predict market reaction, enhance brand relevance

Financial: To detect credit card fraud, greater accuracy and granularity in risk assessment

Cable and TV: Customize TV ads to individual household

Mobile Ads: Ads based on locations

Retail: Insight and understanding of customer likes, dislikes, influences and behaviors

General business: To discover unknown and untapped behaviors and attitudes

Healthcare: Evidence based medicine

Utilities: Understanding of individualized use patterns and better manage demand

Online businesses: Better understanding of customer preferences and social interactions

Crime prevention/intelligence: Better analysis of patterns

SOURCE: McKinsey Global Institute analysis SOURCE: McKinsey

Big data can generate significant financial value across sectors

Solving today’s problems faster

17

Page 18: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

Intel Leverages the Power of Big Data

18

MALWARE

new malware samples per quarter1

MILLION

U.S. cyber attacks per day2

CYBER ATTACKS

MILLION 1 “McAfee Threats Report: Second Quarter 2012,” McAfee, www.mcafee.com/us/resources/reports/rp-quarterly-threat-q2-2012.pdf (PDF) 2 Koebler, Jason, “U.S. Nukes Face Up to 10 Million Cyber Attacks Daily ,” U.S. News & World Report (2012), www.usnews.com/news/articles/2012/03/20/us-nukes-face-up-to-10-million-cyber-attacks-daily

Chip Design Validation: Cut Product Time to Market by 25%

Faster analysis process for validating results

Streamlined debug process through analysis of large volumes of historical test data

Reseller Channel Management: Increased sales by $5M per Qtr. Decreased cost by $6M per Qtr.

Smarter reseller engagement prioritization by leveraging advanced customer profile algorithms

Cost efficient detection of non-complaint claims

Malware Detection: Proof of Concept (POC) Collecting and analyzing large amounts of server security data at the system, network, and application levels lead to discovery of new malware threats before they arise.

Page 19: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

Virtuous Cycle of Data… Inside and Outside the Box

19

Transform / Analyze Compute

Move Networking

Data

Persist Storage

Page 20: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

Reimagining the Possibilities with Big Data Analytics Move to Value & Vision

Enhance understanding, drive innovation, and accelerate personalized medical cures

Create new business models and transform organizational processes

Enhance public safety and transportation, increase energy efficiency and reduce carbon footprint

20

Page 21: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

Big Data Adoption and Deployment Phases

Investigate Understand

business model

Organization alignment

Market &

technology trends

Discover

Define problem statement

Identify business

use cases

Gather requirements

Develop success

metrics

Plan

Identify high value, high visibility use

cases

Define scope & ROI for proof of concept

(POC)

Identify Big Data reference

architecture

Implement

Pilot the POC

Promote and extend the POC result for other

projects

Extend and enhance more

advanced analytic capabilities

• What insights would best benefit your

business?

• What results are you really trying to get?

• What do you want to do with your data?

• What kind of data & correlations are you

interested in mining?

21

• How much return are you expecting on your

investment?

• What is your timeline for getting results?

• Are there other industries or uses you’re

using for a model?

Page 22: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

*Other names and brands may be claimed as the property of others. INTEL CONFIDENTIAL

• Intel Xeon Architecture Roadmap

• Big Data Trends

• Intel in Big Data

• The Big Data HPC connection

Agenda

Page 23: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

Resp

onsi

ve

Ener

gy

Effic

ient

H

igh

Avai

labi

lity

Sec

ure

Intel’s Foundational Technologies Offer Advanced Solutions for Big data Analytics

Cho

ice

Big Data Building Blocks

Intelligent Storage1

Scale-out Storage1 Scale-up Storage1

Intel® SSD 710

series, DC S3700 (SATA)

Intel® SSD 910 series (PCIe)

Intel® Ethernet Controllers

Intel® Ethernet Adapters

Intel® Ethernet Switch Silicon

Intel® True Scale Fabric

Compute Network Storage

Intel® Contribution to OpenSouce Hadoop Intel® Data Center

Manager Intel® Node Manager Intel® Expressway Service Gateway

Intel® Cache Acceleration Software

Intel’s Lustre Intel® VT and

Intel® TXT Intel® AES-NI

Software & Technologies

Intel® Xeon® Product Family E3-

E5-E7 Intel® Atom™

Intel® Xeon PhiTM

Xeon-based storage systems are available in a wide range of configuration options from the industry’s leading storage vendors

Page 24: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

Intel® Xeon® 5600 HDD 1GbE

Hadoop processing time: <10 minutes

Unleash the power of platform TeraSort for 1TB sort: >4 hour process time

Upgrade processor

~50% reduction

Upgrade to SSD

~80% reduction

Upgrade to 10GbE

~50% reduction

Open Source Contributions

~40% reduction

*Other brands and names are the property of their respective owners

Nearly 50x increase in your ability to discover insights

24

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with

other products. Source: Intel Internal testing

For more information go to : intel.com/performance ` Whitepaper

Page 25: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

Intel Intelligence at the Edge

Intel® Intelligent Systems Framework: Simplifying the Internet of Things

Wind River Intelligent Device Platform

Driving Secure Interoperability

Unlocking Edge Data

Filtering Data

Billions of devices that need to share data with each other and

the cloud

Edge systems need to react to streaming

data in real time

Data volume outpacing network

and storage efficiency

Connectivity Manageability Security

Pre-integrated smart and connected capabilities enable rich network options to save development time and costs

Validated and flexible firmware providing an extensive network

of connectivity choices, including broad modem support

and PAN, LAN, and WAN network access

Platform customization significantly reduces time to

product while increasing productive life of M2M

devices

Intuitive web-based tool reduces configuration and

support costs and allows for anytime provisioning and management of devices

Dynamic post dynamic “Services” framework (OSGi)

enables modularized, hardware agnostic

deployment of new apps

Security features designed for M2M development that protect

critical data throughout the device lifecycle

Customizable SRM to ensure the integrity of the end devices via secure boot,

provide encrypted communication between device and management

console in the cloud, and offer device resource management to limit system exposure of

untrusted applications

25

Page 26: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

Intel Data Research

Dozens of Academic Industry & Gov Research Collaborations

GraphLab, GraphBuilder

Disaggregation / Silicon

Photonics

Worry-Free Data Protections to allow you to control how your data is used

Data Economy, Vibrant Data, Data Visualization

Future Data Experiences

Model-Based workloads, Everyday Analytics Analytical Applications

Relationship Discovery, Real-time learning Machine Learning

Algorithms

Graph, parallel, statistical computing partitioned across cloud, client and edge

Distributed Data Computing

Platforms, architectures, DBMSs, smart networks, and the Internet of Things

Data Ecosystem Infrastructure

26

Page 27: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

Summary: Intel in Big Data

The pervasiveness of Intel Architecture democratizes the implementation and performance of Big Data everywhere

Accelerate analytics: CPU, storage, and

network Optimized ISV

software stacks and services

Foster the growth of market partners

Solution research and academia engagement

Distribute analytics to the edge

27

Page 28: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL 28

BioScience: Genomics for Translational Medicine Hadoop for Data Correlation & Discovery Insights

Challenge: Derive new value added patient discovery services while bringing down genome processing costs Solution: Dynamically partition/scale Hbase for correlation of patient data to all public data Benefits: Contributes to 800x reduction in cost to process 4 M genome variants Data Characteristics: • 10 Node Hbase Cluster • Billions of pre-computed correlations

1 Genome 10 Million rows

100 Genomes 1Billion rows

1M Genomes 10 Trillion rows

100M Genomes 1 Quadrillion 1,000,000,000,000,000 rows

Billions of Pre-computed

Correlations

New Biomed Info-Products

Data Ingest

Page 29: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

Public Sector- Smart Traffic Intelligent Transport System Hadoop for Predictive Analytics

29

Challenge: Analyze city traffic to derive statistics for crime prevention, info sharing, and predictive traffic analysis Solution: Embed HBase client in camera for real-time inserts of structured/unstructured data Benefits: •Automated queries for traffic violation •Data mining of fake licenses <1 minute for all data captured for a week •Predictive traffic forecasting

Data Characteristics: • 30000 + camera data collection points • Petabytes of traffic data & terabytes of

images • 2 billion HBase records

App Servers

Regional Data Collection

Distributed Processing Across District Nodes

Derived Analytics Services

Crime Prevention Citizen Traffic Services

Page 30: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

Telco- China Unicom Hadoop for Behavioral Analysis

Challenge: Analyze subscriber web usage and billing to derive new information products Solution: Scale out storage based on Hbase with network optimization based on web traffic, log analysis for daily reporting Benefits: New customer segmentation Data Characteristics: • 188 nodes, 14TB/server • 2.5PB raw disk capacity • High speed data loading • Real-time query (latency <1s) • Daily statistics & reports (sum, count,

join, etc)

Subscriber Usage & Billing

ETL

• MapReduce/Hive • Hbase • HDFS

• Log Analysis • Daily Reports

Storage, Analytics

New Customer Segmentation & Insights

Page 31: Austin Cherian: Big data and HPC technologies - intel

INTEL CONFIDENTIAL

*Other names and brands may be claimed as the property of others. INTEL CONFIDENTIAL

• Intel Xeon Architecture Roadmap

• Big Data Trends

• Intel in Big Data

• The Big Data HPC connection

Agenda

Page 32: Austin Cherian: Big data and HPC technologies - intel

Intel Confidential

Lustre: The Most Used HPC File System

Based on Intel analysis of the 11/2013 Top500, Lustre accounts for:

1. 7 of the fastest 10 systems in the world

2. And ~60% of the top 50 systems

Per May 2014 survey research from IDC, Lustre is used by +50% of sites • 16-20% use shares for GPFS, NFS

and pNFS • HDFS shown to illustrated only to

reflect extent of HPC sites deploying Hadoop workloads onto HPC (diskless) storage platforms

32

Lustre GPFS NFS pNFS Red Hat HDFS

1 Source: IDC survey research, May 2014

1 Source: Shared rounded to nearest percent, totals exceed 100% due to the use of multiple file systems

Page 33: Austin Cherian: Big data and HPC technologies - intel

Intel Confidential

The Intel® Solutions for Lustre Portfolio

33

Intel® Enterprise Edition for Lustre* software v2 • Simple, powerful management tools added to full Lustre release foundation

• Maximum performance with minimal management complexity and costs

• Sold through global reseller network (comprised of OEM and integrators)

Intel® Cloud Edition for Lustre* software • Fast, cost effective parallel storage for applications deployed on cloud

infrastructure

• Uses Amazon AWS storage (EC2) and compute instances

• Multiple support options available

• Sold today via Amazon Web Services Marketplace

Page 34: Austin Cherian: Big data and HPC technologies - intel

Intel Confidential

Intel® Manager for Lustre Streamlined configuration and management workflow Advanced charting and reporting Intel® Manager for Lustre support for HSM Support for larger sized configurations

Storage Servers Full distribution of open source Lustre v2.5 Support for Red Hat Enterprise Linux 6.4, CentOS 6.4 New support for SUSE SLES 11 servers (cannot use SUSE for IML)

Compute Clients Native Lustre client for Intel® Xeon Phi™ Improved client I/O performance Expanded client platform support

Enterprise Edition v2 – Features and Improvements

34

Page 35: Austin Cherian: Big data and HPC technologies - intel

Intel Confidential

Native Lustre Client for Intel® Xeon Phi™

• Native Lustre client for Intel® Xeon Phi™

• Allows applications running on Phi to have direct access to fast, scalable storage resource

• Benefit: Improved I/O performance for Xeon Phi™ applications

*

0

50

100

150

200

250

300

NFS over Virtual Ethernet Lustre over Virtual IB

MB

/sec

IOZONE Benchmark using 32 threads

Write Read

10X

1 Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.

35

Page 36: Austin Cherian: Big data and HPC technologies - intel

Intel Confidential

Lustre: Ideal for Hadoop Workloads

Convergence of HPC and data analytics Desire for HPC systems to run Hadoop workloads Hadoop is the most popular software

stack for big data analytics

Lustre is the file system of choice for HPC clusters

Challenge: Use Lustre with Hadoop

Benefits of using Enterprise Edition for Lustre with Hadoop applications Improved application performance –

without changing applications

More efficient and productive storage resources

No data transfer overhead for staging inputs and extracting results

Eliminates 3-way replication used by HDFS

Shared, easily managed storage - no need to arbitrarily partition storage into HPC (Lustre) and Analytics (HDFS) islands

36

Page 37: Austin Cherian: Big data and HPC technologies - intel

Intel Confidential

Enterprise Edition for Lustre v2

Optimized Storage for

Hadoop Application

s

Hierarchical Storage

Management Monitoring & data movement

tools

Intel® Manager for Lustre Configure, Troubleshoot, Monitor,

Manage

CLI

REST API Extensibility

Management and Monitoring Services

Lustre File System Full distribution of open source Lustre software v2.5

Storage Plug-In

Integration

37

Open source base Intel value-add for Lustre Interoperability with Hadoop distributions for fast, shared, simple to manage storage for MapReduce applications

Page 38: Austin Cherian: Big data and HPC technologies - intel

Thank You