endeca performance considerations

26
Endeca Performance and Scalability Hard-won lessons from the field – Peter Curran, Founder Cirrus10 art by Liam Brazier, buy it here! liambrazier.com/Shop

Upload: cirrus10

Post on 15-Feb-2017

1.820 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Endeca Performance Considerations

Endeca Performance and Scalability Hard-won lessons from the field – Peter Curran, Founder Cirrus10 art by Liam Brazier, buy it here! liambrazier.com/Shop

Page 2: Endeca Performance Considerations

Seattle HQ, distributed team~50 resources (25 EE + subs)All onshore laborEndeca or Oracle partner since 2010

End-to-end implementationsRelevance tuningArchitecture & process analysisProgram roadmapsUpgrades & migrations

BASICS

WHAT WE DO

Time & materialsFixed fee with risk premiumCost + bonusEasy contractsROI guarantees

OUR METHODS

~70 Endeca customersB2C and B2BCMS GurusMarquee Presenter at OOW 2014100% Referenceable

EXPERIENCE

Page 3: Endeca Performance Considerations

Agenda

MDEX PerformanceUpdate PerformanceCase study: Auto Parts

Page 4: Endeca Performance Considerations

ITLIndex ingestion• Forge• CAS

MDEXThe index itself• Dgraphs

AssemblerApplication interface• Service / Process

Diagram here:bit.ly/1PvJYFX

Page 5: Endeca Performance Considerations

Query-time performanceThe primary consideration

Page 6: Endeca Performance Considerations

Endeca performance tools

What do I need tools for?• Why did it break?• Will it break this year?

Tools1. MDEX Request Logs2. Request Log Analyzer (Cheetah)3. MDEX Perf – Load Testing (Eneperf)

Art by Liam Brazier+

Page 7: Endeca Performance Considerations

What is the request log?• MDEX’s main log file – dumps every query to a log• Includes query latency and time of day

Why is it useful?• Parse it to see what the heck happened• Replay or spoof it up to answer “what if”

Where do you find it?• <working-dir>/logs/dgraphs/Dgraph1/

MDEX Request Logs

Page 8: Endeca Performance Considerations

Request log analyzer (aka Cheetah)

Cheetah is an MDEX Log analysis toolReports performance statsHelps identify trendsDownloadable from Oracle

Page 9: Endeca Performance Considerations

MDEXperf is a load-testing utility• Ships with Endeca

What is MDEX load testing?• Send simulated user traffic against MDEX and site• Learn how site performs under specific traffic conditions

Keys to a successful load test…• Stress system in way that represents expected production usage• Monitor performance during and after each test iteration• Test all scenarios, functionality, and technology

MDEXperf (aka ENEperf) – Load Testing

Page 10: Endeca Performance Considerations

Resist the dark side

Avoid default setNavAllRefinements / allgroups=1 if possible

Exact, Phrase, and Proximity relevance ranking modules are expensiveResponse sizes > 500kbUse record filters before text searchesAvoid large flat dimensions

Art by Liam Brazier+

Page 11: Endeca Performance Considerations

But the dark side is sooooo tempting …

WildcardingInteractions of large thesaurus + spelling + stemming on large datasetsFrequent Partial UpdatesNot enough physical RAM on server

Art by Liam Brazier+

Page 12: Endeca Performance Considerations

Ingestion PerformanceThe primary consideration 2 years after you implement

Page 13: Endeca Performance Considerations

Before we talk data ingestion…

Let’s talk sandwiches!

Is a hot dog a sandwich?Is a pizza an open-faced sandwich?Can an American city be truly great w/o a signature sandwich?

• If so: Los Angeles? Is a taco a sandwich?• New Orleans: Po’ Boy or Muffaletta?• Which city should claim the hot dog?

• Correct answer: Chicago

Page 14: Endeca Performance Considerations

What happens when you index

Forge Dgidx Index Distribution

Join data sources and manipulate the data

(Step 1)Generate index file

(Step 2)

Distribute the files across Dgraph

(Step 3)

Total Index Time

Page 15: Endeca Performance Considerations

Factors that might jack up your indexing time

Size of the index• 1,000,000+ records

Type of records in index• Catalog, Web Content, Social

Content, Analytical Content

Features and functionality • Store inventory, Store level pricing• Compatibility (Fitment)• Endeca Recommendations

Data Model• Wide record vs. RRN• Internationalization• Type of joins

Data Manipulations• Data cleanups - Java/Perl/XML

manipulators

Components Usage• Traditional Forge• CAS (Multi-threaded)

Page 16: Endeca Performance Considerations

Two approaches for modeling complex relationships

RRNWide Records• De-normalized model• Adds store inventory to

the product record• Joins happen at indexing

• Normalized model• Inventory stored in separate

record from products• Joins happen at query time

• PRO: Fast queries• CON: Slow updates• CON: More back-end code

• PRO: Fast updates• CON: Slower-ish queries• CON: More front-end code

Page 17: Endeca Performance Considerations

Indexing scars

Use a real ETL tool if you canUse record cache when joining the data sources in the pipeline. CAS is multi threaded, but it’s not as flexible as traditional ForgeBeware Forge left joinsDgidx is multi-threaded. Configure optimal threads to hasten this step.

Art by Liam Brazier+

Page 18: Endeca Performance Considerations

More cuts and bruises

Use Dgidx flags carefully, specifying many pre-computed sorts can affect the performance.If index distribution time is slow, consider rolling your own approach to compress the index before distributing it

Art by Liam Brazier+

Page 19: Endeca Performance Considerations

Backend performance case studyMajor Auto Parts Company

Page 20: Endeca Performance Considerations

Case Study: Major auto parts company

3 major sites live since 2003Originally a bridged multi-MDEXLarge index due to fitmentRe-engineered for wide records

• <100ms MDEX response time• 3 updates/wk at many hours each• Tried partial updates but failed

Art by Liam Brazier

Page 21: Endeca Performance Considerations

Wide record model

• 110,000,000 very wide records

Page 22: Endeca Performance Considerations

RRN Model

• 4,500,000 narrow records

Page 23: Endeca Performance Considerations

Baseline update performance

Page 24: Endeca Performance Considerations

Partial update performance

Page 25: Endeca Performance Considerations

Forget the session! Build your biceps!

Reception w/Bodybuilding.comOracle Open World

Tuesday 27-Oct 2015Foreign Cinema, San Francisco

Rinse away OOW15Eat good food

Watch foreign moviesHang with smart people

Page 26: Endeca Performance Considerations

Let’s get started