hitachi’s technology and vision for for e · pdf filehitachi’s technology and...

21
1 1 HITACHI’S TECHNOLOGY AND VISION FOR FOR E-RESEARCH MATTHEW O’KEEFE, PH.D. OFFICE OF TECHNOLOGY AND PLANNING HITACHI DATA SYSTEMS

Upload: hoangkien

Post on 16-Mar-2018

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

1 1

HITACHI’S TECHNOLOGY AND VISION FOR FOR E-RESEARCH MATTHEW O’KEEFE, PH.D. OFFICE OF TECHNOLOGY AND PLANNING HITACHI DATA SYSTEMS

Page 2: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

2 Hitachi Data Systems CONFIDENTIAL

ERESEARCH AT HITACHI DATA SYSTEMS

§  …will become at scale agile processes

§  realized by multidiscipline teams

§  leveraging a variety of data categories and types flowed through various technologies,

§  including provisions for security and privacy.

§  The end result – timely discovery of sparks of insights leading to valuable innovation and knowledge.

§  I.e., really just what e-researchers have been doing for a long time

Page 3: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

3 Hitachi Data Systems CONFIDENTIAL

WAYS TO THINK ABOUT COMPUTATION AND BIG DATA

§  “The purpose of computing is insight, not numbers.” Richard Hamming, pioneer in numerical computing.

§  “It’s about curiosity followed by action. You look at the dataset and then go deeper to discover something.” Richard Janert, author of Data Analysis with Open Source Tools.

§  “For what I do—and this is really the only data analysis I can speak about with any sense of confidence—the most important skill is curiosity.” Richard Janert, author of Data Analysis with Open Source Tools.

Page 4: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

4

HITACHI, LTD. GLOBAL RESEARCH AND DEVELOPMENT: RESEARCH ACROSS MANY DISCIPLINES

Production Engineering Research Laboratory

Mechanical Engineering Research Laboratory

System Development Laboratory

Hitachi Research Laboratory

Advanced Research Laboratory

Central Research Laboratory

Page 5: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

5

HOW TO USE DATA TO INCREASE OIL PRODUCTION? A PROBLEM I’VE BEEN WORKING ON…

Page 6: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

6 Hitachi Data Systems CONFIDENTIAL

BAKKEN BASICS

•  Primarily in northwest North Dakota, Montana, Alberta •  Largest contiguous oil formation ever measured by US

Geological Survey — discovered in 1951 •  Up to 900 billion barrels oil in place, up to 45 billion

recoverable •  2000 wells drilled per year in North Dakota alone, currently

about 8000 wells producing, another 30,000 to 40,000 to be drilled ‒  $10 million per well capital cost, roughly $20 billion per year

across the play ‒  Texas’s Eagle Ford shale oil play on a similar scale

•  Incredible leverage for big data insights: can be applied across thousands of wells

Page 7: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

7 Hitachi Data Systems CONFIDENTIAL

UP TO 16 WELLS PER 2 SQUARE MILES TARGETING 4 SEPARATE VERTICAL LAYERS

Page 8: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

8 Hitachi Data Systems CONFIDENTIAL

SO WE HAVE THE DATA, WHERE AND HOW CAN WE APPLY MACHINE LEARNING?

•  Here are some potential ideas: •  Well Optimization Model

•  Geological Classification

•  Well Classifier

•  Localized Sensitivity Analysis

•  Note: statistical analysis and machine learning needs to be driven by potential hypothesis and insights

•  To get to the point where you have potential hypothesis and insights, you need to spend a lot of time curiously examining the data, talking to your multidisciplinary team members, and iterating in this process

Page 9: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

9

Information

SINGLE VIRTUALIZATION PLATFORM FOR INFRASTRUCTURE, CONTENT AND INFORMATION

LAY FLEXIBLE FOUNDATION FOR THE FUTURE TO GROW WITH YOUR NEEDS

§  ANALYTICS §  INTEGRATION

§  BUSINESS INTELLIGENCE

§  BIG DATA TODAY

Infrastructure

Content §  SEARCH, DISCOVER AND INTEGRATE

INDEPENDENT OF APPLICATIONS

§  ON DEMAND CONTENT §  ARCHIVING AS A SERVICE

§  VIRTUALIZATION, MOBILITY §  INTEGRATED MANAGEMENT §  DATA CENTER CONVERGENCE

§  INFRASTRUCTURE ON DEMAND

Page 10: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

10

THE 5 PHASES OF E-RESEARCH DATA

Collect Analyse

& Rationalise

Share Preserve

Re-use

Page 11: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

11

THE 5 PHASES OF E-RESEARCH DATA

REQUIREMENTS & ATTRIBUTES

•  High speed transfer •  Parallel transfer of Data Collect

•  Rapid Provisioning •  Efficient Capacity Allocation

Analyse & Rationalise

•  Simultaneous Connectivity •  Standard/De facto interfaces Share

•  Protected in original form •  Dynamically & Automatically Mobile Preserve

Re-use

1

2

3

4

5

Page 12: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

12

A RANDOM WALK THROUGH SOME RELEVANT HITACHI TECHNOLOGIES FOR E-RESEARCH

Page 13: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

13

WHERE DO WE PLAY? (TODAY AND TOMORROW)

Home Directory/ High capacity Repositories

Staging or modest cluster storage

Scratch/ Workspace

Archive HSM

Compute Cluster

Users Network/ Internet

NFS Parallel FS

Persistent data

Temporary data

HDS TODAY ENABLED TOMORROW (complete HDS portfolio plus Lustre)

SCHEMATIC VIEW OF HIGH PERFORMANCE ENVIRONMENTS

Compute

•  KEY INNOVATION AREAS FOR HITACHI GOING FORWARD •  Cheap and deep •  FLASH •  Optical archives •  Converged infrastructure •  HPC capable hardware

Page 14: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

14

HITACHI CLUSTER FILE SYSTEM (HSFS)

14

HSFS : Hitachi Striping File System

- Share Single File System with Multiple Nodes (Max 1024 Nodes)

- Distribute Files over Multiple I/O Servers

- High Throughput Perfomance by Parallel I/O on each I/O Server

- Two Striping Features(File Striping & Block Striping) Available

- Support AIX and Linux System, Also AIX-Linux Heterogeneous System

- Available for Hadoop(Replacement for HDFS)

Client

File1

File4

. . .

Server

1

File2 File3

5

13 17 21

2 6

14 18 22

3 7

15 19 23

Client

Client

Server

Client

Client

Server

Client

Client

Client

Server

File Striping

Block Striping 9 10 11 4 8

16 20 24

12

Page 15: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

15

HITACHI AND EXASCALE

§  Japan’s K Computer

‒  Fastest in the world, last year

‒  > 11 PF

‒  30MW

§  Next Generation being planned

‒  Projected for 2018

‒  1 EF goal

‒  HSFS(2) file system ‒  128K Nodes

‒  HSM ‒  1 PB memory ‒  10 PB Tier 1 File System storage on SSD ‒  100’s PB Tier 4, optical, etc.

Page 16: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

16

CHALLENGES OF DEPLOYING FLASH MEMORY

§  NAND Flash cells are programmed ‒  Can Read and Writes to pages using LSF scheme ‒  Updates need previous Erasing of a large Block ‒  Causes «Write Cliff» when free blocks are exhausted

§  Write Endurance of Flash Cells is limited ‒  SLC vs MLC (1 vs 2 bits per cell = 100K vs 10K write cycle) ‒  Needs very robust Error detection and correction logic ‒  Additional techniques must be applied: ‒  Log Structured File approach for Writes ‒  Overprovisioning ‒  Garbage Collection ‒  various compression schemes ‒  Wear Leveling

THE NEED FOR A NAND FLASH CONTROLLER

Page 17: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

17

HITACHI ACCELERATED FLASH STORAGE

RACK OPTIMIZED FLASH MODULE DRIVE

DKC

DKU

FBX

Flash Module Drive Chassis (FBX) With Max 48 FMD’s

RAID-1, RAID-5, and RAID-6 (managed by VSP) - Single FBX scales 6.4TB to 307.2TB

Flash Module Drive (FMD) 1.6TB Capacity

3.2 TB in 1Q2013

Flash Module Unit (FMU) 12 FMD’s in 2U Chassis

ASIC

Virtual Storage Platform

14U

13U

13U

Page 18: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

18

TECHNOLOGY LONGEVITY

Historical Casualties

Laserdisc

Magneto-optical

Ultra Density Optical - UDO

Ultra Media Disc - UMD

HD DVD

Today, you can buy new standard drives that are compatible with media written over 30 years ago. This trend will continue due to markets for consumer and distribution driven volume

*

BD

Cap

acity

200GB

5TB

50GB

1980 1990 2000 2010 2015 2020

8.5GB DVD

700MB CD 1GB

2TB

0.5GB 640MB

4.7GB

100GB 128GB

50GB

Still exists, still supported

Still exists, still supported

Still exists, still supported Still exists, still supported

Compatibility Track Multi-Market Support UDF Format Support

Over 3 Active Decades

Over 2 Active Decades

1.8TB

3.8TB

1st Generation

2nd Generation

3rd Generation

4th Next Generation

Holographic

*

BDXL 200GB*

400GB** 256GB**

512GB**

Page 19: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

19

HOLOGRAPHIC DATA STORAGE

§  Holographic Storage store data elements as images at different angels

X [SLM pixels]

Y [

SL

M p

ixe

ls]

SLM

-6 -4 -2 0 2 4 6

-6

-4

-2

0

2

4

6

2 Dimension Data (Mega pixels)

Page 20: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

20

CURRENT STATUS: ERESEARCH AND HITACHI

§  Current eResearch Node infrastructure supported by Hitachi ‒  Intersect Australia

‒  eResearch SA

§  Additional related academic infrastructure ‒  Griffith University [Nathan Campus]

‒  Queensland Brain Institute

‒  Geoscience Australia

§  Garvan Institute of Medical Research ‒  awarded Hitachi’s Health and Life Sciences Innovation Award

Page 21: HITACHI’S TECHNOLOGY AND VISION FOR FOR E · PDF fileHITACHI’S TECHNOLOGY AND VISION FOR FOR ... • Converged infrastructure ... Current eResearch Node infrastructure supported

21 21 21

THANK YOU