spark your legacy - distributing an 8-year monolith

42
1 © 2015 Kenshoo, Ltd. Proprietary Information Spark Your Legacy: Distributing an 8-year Monolith Tzach Zohar, Kenshoo, May 2015

Upload: tzach-zohar

Post on 31-Jul-2015

161 views

Category:

Software


0 download

TRANSCRIPT

1© 2015 Kenshoo, Ltd. Proprietary Information

Spark Your Legacy: Distributing an 8-year Monolith

Tzach Zohar, Kenshoo, May 2015

2© 2015 Kenshoo, Ltd. Proprietary Information

Who?

Tzach Zohar

Architect @ Kenshoo

[email protected]://il.linkedin.com/in/tzachzohar

3© 2015 Kenshoo, Ltd. Proprietary Information

Where?

● Online advertising technology● 9-year old startup● ~500 employees● Data-intensive (aren’t we all?)

4© 2015 Kenshoo, Ltd. Proprietary Information

Agenda● Project Background● Why not to Greenfield● Refactoring Challenges● Solutions

5© 2015 Kenshoo, Ltd. Proprietary Information

Project Background

6© 2015 Kenshoo, Ltd. Proprietary Information

Domain: Data Aggregation● Of: advertising metrics● On: versatile, batched, occasionally re-stated input● By: many different keys● When: now + ~0.5 hour● While: filtering and normalising per business rules● For: eternity (data lives forever)

7© 2015 Kenshoo, Ltd. Proprietary Information

Domain: Data Aggregation

Slow

Sources

Fast

Custom

Re-stated

Norm

alize

Aggregate

By X

By Y

By X + Y

...

Observations

8© 2015 Kenshoo, Ltd. Proprietary Information

Domain: Data Aggregation

Slow

Sources

Fast

Custom

Re-stated

Norm

alize

Aggregate

By X

By Y

By X + Y

...

Observations

Aggregate

9© 2015 Kenshoo, Ltd. Proprietary Information

Requirement: Better, Faster ● Higher throughput: business is growing● More keys: and ad-hoc aggregations● Linear scalability: anything else is not cost-effective● Easy to enhance: by any decent developer

10© 2015 Kenshoo, Ltd. Proprietary Information

Chosen Design: Spark

sources

Norm

alize

Driver

HDFS + Spark Cluster

11© 2015 Kenshoo, Ltd. Proprietary Information

Chosen Design: Spark

sources

Norm

alize

Driver

HDFS + Spark Cluster

Landing Zone

12© 2015 Kenshoo, Ltd. Proprietary Information

Chosen Design: Spark

sources

Norm

alize

Driver

HDFS + Spark Cluster

Landing Zone

By X

By Y

By X+Y

...Spark Jobs

13© 2015 Kenshoo, Ltd. Proprietary Information

B: New Shiny System

Great, but how do we get there?

A: Legacy System

Refactoring?

“Greenfield” project?

???

14© 2015 Kenshoo, Ltd. Proprietary Information

Why Not to “Greenfield”

15© 2015 Kenshoo, Ltd. Proprietary Information

Q1 Q3Q2

Legacy

Challenge: Moving Target

16© 2015 Kenshoo, Ltd. Proprietary Information

Q1 Q3Q2

Legacy

New System

Challenge: Moving Target

17© 2015 Kenshoo, Ltd. Proprietary Information

Q1 Q3Q2

Legacy Legacy’

New System

Challenge: Moving Target

18© 2015 Kenshoo, Ltd. Proprietary Information

Challenge: Zero Diff Tolerance● Different clients have different data, different

customizations, different scales● Our data is often validated against external

sources

19© 2015 Kenshoo, Ltd. Proprietary Information

Challenge: Code Is Our Only Spec

?But it isn’t necessarily a friendly one...

20© 2015 Kenshoo, Ltd. Proprietary Information

Challenge: Code Is Our Only SpecWhat exactly should the new system do?

21© 2015 Kenshoo, Ltd. Proprietary Information

Challenge: Test Reuse?Tests assume a single-server setup...

22© 2015 Kenshoo, Ltd. Proprietary Information

Challenge: Test Reuse?Some are coupled with current implementation...

23© 2015 Kenshoo, Ltd. Proprietary Information

Refactoring Challenges

24© 2015 Kenshoo, Ltd. Proprietary Information

Challenge: Legacy Code Some of it still untested

25© 2015 Kenshoo, Ltd. Proprietary Information

Challenge: Tight Coupling Implementation is tightly coupled with many other components

Kenshoo Server

Search Engines

SE API Facade

Web U

ser Interface

Proxy Servers Client's Website

Client Users

Client Systems / DWH

Entity

Mgm

t / D

AO

Normalizers

Optimization Algorithms

Data P

roviders / S

core SQ

L B

uilder

Client Configuration

SEM Entity Data

Performance Data

Campaign Generation Tools (RTC, KW Tool)

Report Generation

Bulk Editing and Advanced Features

Conf.

DAO

Kenshoo Editor

FTP Sites

Tracking Processor

Aggregator

HELP ME!

26© 2015 Kenshoo, Ltd. Proprietary Information

Challenge: Paradigm Shift How do you gradually refactor a single-node java application into a distributed Spark application?

27© 2015 Kenshoo, Ltd. Proprietary Information

Solutions

28© 2015 Kenshoo, Ltd. Proprietary Information

Legacy System New System

Solution #1: Shared Code

29© 2015 Kenshoo, Ltd. Proprietary Information

Legacy System New System

Solution #1: Shared Code

Core Business

Rules

1. Refactor legacy code to create stand-alone jar

30© 2015 Kenshoo, Ltd. Proprietary Information

Legacy System New System

Solution #1: Shared Code

Core Business

Rules

2. Build new system around this core code 1. Refactor legacy code to create stand-alone jar

Core Business

Rules

31© 2015 Kenshoo, Ltd. Proprietary Information

Solution #1: Shared CodeBusiness rules refactored into Java static methods, to avoid serialization issue in Spark

32© 2015 Kenshoo, Ltd. Proprietary Information

Solution #2: Empiric Reverse Engineering

33© 2015 Kenshoo, Ltd. Proprietary Information

Solution #2: Empiric Reverse Engineering

34© 2015 Kenshoo, Ltd. Proprietary Information

Solution #2: Empiric Reverse Engineering

35© 2015 Kenshoo, Ltd. Proprietary Information

Solution #3: Local Mode Testing

Legacy System

New Aggregation System

Spark

36© 2015 Kenshoo, Ltd. Proprietary Information

Solution #3: Local Mode Testing

Legacy System

New Aggregation System

Spark

1. Embed Spark in Aggregation System

37© 2015 Kenshoo, Ltd. Proprietary Information

Solution #3: Local Mode Testing

Legacy System

New Aggregation System

Spark

1. Embed Spark in Aggregation System2. Embed Aggregation System in Legacy

38© 2015 Kenshoo, Ltd. Proprietary Information

Solution #4: Side-by-SideBoth at the component level and at the system level

39© 2015 Kenshoo, Ltd. Proprietary Information

Solution #4: Side-by-Side

40© 2015 Kenshoo, Ltd. Proprietary Information

Solution #4: Side-by-Side

41© 2015 Kenshoo, Ltd. Proprietary Information

Solution #4: Side-by-Side

42© 2015 Kenshoo, Ltd. Proprietary Information

Questions?