lambda architecture the hive

12
Lambda Architecture Use Case: Mayo Clinic FEBRUARY 2015 Altan Khendup – Leader, UDA Architecture COE

Upload: altan-khendup

Post on 19-Jul-2015

82 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Lambda Architecture The Hive

Lambda ArchitectureUse Case: Mayo Clinic

FEBRUARY 2015Altan Khendup – Leader, UDA Architecture COE

Page 2: Lambda Architecture The Hive

2

Background of Lambda Architecture

Background

– Reference architecture for Big Data systems

– Designed by Nathan Marz (Twitter)

– Defined as a system that runs arbitrary functions on arbitrary data

– “query = function(all data)”

Design Principles

– Human fault-tolerant, Immutability, Computable

Lambda Layers

– Batch - Contains the immutable, constantly growing master dataset.

– Speed - Deals only with new data and compensates for the high latency updates of the serving layer.

– Serving - Loads and exposes the combined view of data so that they can be queried.

Page 3: Lambda Architecture The Hive

3

Overview of Lambda Architecture

Page 4: Lambda Architecture The Hive

4 © 2014 Teradata

USE CASE – MAYO CLINIC

Page 5: Lambda Architecture The Hive

Mayo Clinic HistoryEvery year, more than a million people from all 50 states

and nearly 150 countries come for care

Dozens of locations in several states with major campuses in Rochester, Minn.; Scottsdale and Phoenix,

Ariz.; and Jacksonville, Fla.

Mayo Clinic Rochester, Minn. recognized as the top hospital in the nation for 2014-2015 by U.S. News &

World Report

Page 6: Lambda Architecture The Hive

Why Big Data?Challenges in Medical Data

Health data tends to be “wide”, not “deep”New data types are becoming more important

Unstructured

Real-time streaming

A challenge to generally move from retrospective “BI” viewing to event-based and predictive analytics usage

Multiple layers

Lots of events, data

Complex

Lots of different languages and data structures

Difficult to maintain

Lots of moving pieces/components/technologies

Lots of changes in the business

Page 7: Lambda Architecture The Hive

Data DiscoveryMany “Big Data” stories start with data discovery

The Data Lake, etc.

But, data discovery is not predictable!

Mayo Clinic needed to define a real operational need that a “Big Data” technology stack could fulfill

Page 8: Lambda Architecture The Hive

ProjectOptimize an existing Natural Language Processing

pipeline in support of critical Colorectal Surgery (Move to tens of thousands of documents processed)

Replace an existing free-text search facility used by Clinical Web Service for colorectal cancer

(Move search to milliseconds)

Page 9: Lambda Architecture The Hive

9

Overall Architecture

Page 10: Lambda Architecture The Hive

10

• Current Storm throughput up to 1.5 million documents per hour

• Average of 140,000 HL7 messages actually processed per day with average latency of 60 milliseconds from ingest to persistence

• Average of 50,000 documents passed through annotators per day versus 5,000 historically

• Actual annotations of documents up to 6 times faster than previously accomplished

• Free-text search use cases that took over 30 minutes on old infrastructure completing in milliseconds in ElasticSearch

Operational Statistics

Page 11: Lambda Architecture The Hive

11

• Benefits

– An architecturally-driven, internally-owned technology stack that blends:

- An event-based/”real-time” processing fabric

- A multi-destination distillation hub

- A foundation for “Classic” BI delivery techniques

- A foundation for “Services-based” delivery techniques

- A “serendipitous” discovery environment

– Mutually supportive components that combine in delivering novel clinical solutions

– Data continuity

- Historical data can be assessed as algorithms change over time

Summary

Page 12: Lambda Architecture The Hive

12

Thank you! We’re Hiring!thinkbigcareers.teradata.com

Altan Khendup (@madmongol)

[email protected]

Ron Bodkin (@ronbodkin)

[email protected]