endeca performance considerations

Post on 15-Feb-2017

1.822 Views

Category:

Internet

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Endeca Performance and Scalability Hard-won lessons from the field – Peter Curran, Founder Cirrus10 art by Liam Brazier, buy it here! liambrazier.com/Shop

Seattle HQ, distributed team~50 resources (25 EE + subs)All onshore laborEndeca or Oracle partner since 2010

End-to-end implementationsRelevance tuningArchitecture & process analysisProgram roadmapsUpgrades & migrations

BASICS

WHAT WE DO

Time & materialsFixed fee with risk premiumCost + bonusEasy contractsROI guarantees

OUR METHODS

~70 Endeca customersB2C and B2BCMS GurusMarquee Presenter at OOW 2014100% Referenceable

EXPERIENCE

Agenda

MDEX PerformanceUpdate PerformanceCase study: Auto Parts

ITLIndex ingestion• Forge• CAS

MDEXThe index itself• Dgraphs

AssemblerApplication interface• Service / Process

Diagram here:bit.ly/1PvJYFX

Query-time performanceThe primary consideration

Endeca performance tools

What do I need tools for?• Why did it break?• Will it break this year?

Tools1. MDEX Request Logs2. Request Log Analyzer (Cheetah)3. MDEX Perf – Load Testing (Eneperf)

Art by Liam Brazier+

What is the request log?• MDEX’s main log file – dumps every query to a log• Includes query latency and time of day

Why is it useful?• Parse it to see what the heck happened• Replay or spoof it up to answer “what if”

Where do you find it?• <working-dir>/logs/dgraphs/Dgraph1/

MDEX Request Logs

Request log analyzer (aka Cheetah)

Cheetah is an MDEX Log analysis toolReports performance statsHelps identify trendsDownloadable from Oracle

MDEXperf is a load-testing utility• Ships with Endeca

What is MDEX load testing?• Send simulated user traffic against MDEX and site• Learn how site performs under specific traffic conditions

Keys to a successful load test…• Stress system in way that represents expected production usage• Monitor performance during and after each test iteration• Test all scenarios, functionality, and technology

MDEXperf (aka ENEperf) – Load Testing

Resist the dark side

Avoid default setNavAllRefinements / allgroups=1 if possible

Exact, Phrase, and Proximity relevance ranking modules are expensiveResponse sizes > 500kbUse record filters before text searchesAvoid large flat dimensions

Art by Liam Brazier+

But the dark side is sooooo tempting …

WildcardingInteractions of large thesaurus + spelling + stemming on large datasetsFrequent Partial UpdatesNot enough physical RAM on server

Art by Liam Brazier+

Ingestion PerformanceThe primary consideration 2 years after you implement

Before we talk data ingestion…

Let’s talk sandwiches!

Is a hot dog a sandwich?Is a pizza an open-faced sandwich?Can an American city be truly great w/o a signature sandwich?

• If so: Los Angeles? Is a taco a sandwich?• New Orleans: Po’ Boy or Muffaletta?• Which city should claim the hot dog?

• Correct answer: Chicago

What happens when you index

Forge Dgidx Index Distribution

Join data sources and manipulate the data

(Step 1)Generate index file

(Step 2)

Distribute the files across Dgraph

(Step 3)

Total Index Time

Factors that might jack up your indexing time

Size of the index• 1,000,000+ records

Type of records in index• Catalog, Web Content, Social

Content, Analytical Content

Features and functionality • Store inventory, Store level pricing• Compatibility (Fitment)• Endeca Recommendations

Data Model• Wide record vs. RRN• Internationalization• Type of joins

Data Manipulations• Data cleanups - Java/Perl/XML

manipulators

Components Usage• Traditional Forge• CAS (Multi-threaded)

Two approaches for modeling complex relationships

RRNWide Records• De-normalized model• Adds store inventory to

the product record• Joins happen at indexing

• Normalized model• Inventory stored in separate

record from products• Joins happen at query time

• PRO: Fast queries• CON: Slow updates• CON: More back-end code

• PRO: Fast updates• CON: Slower-ish queries• CON: More front-end code

Indexing scars

Use a real ETL tool if you canUse record cache when joining the data sources in the pipeline. CAS is multi threaded, but it’s not as flexible as traditional ForgeBeware Forge left joinsDgidx is multi-threaded. Configure optimal threads to hasten this step.

Art by Liam Brazier+

More cuts and bruises

Use Dgidx flags carefully, specifying many pre-computed sorts can affect the performance.If index distribution time is slow, consider rolling your own approach to compress the index before distributing it

Art by Liam Brazier+

Backend performance case studyMajor Auto Parts Company

Case Study: Major auto parts company

3 major sites live since 2003Originally a bridged multi-MDEXLarge index due to fitmentRe-engineered for wide records

• <100ms MDEX response time• 3 updates/wk at many hours each• Tried partial updates but failed

Art by Liam Brazier

Wide record model

• 110,000,000 very wide records

RRN Model

• 4,500,000 narrow records

Baseline update performance

Partial update performance

Forget the session! Build your biceps!

Reception w/Bodybuilding.comOracle Open World

Tuesday 27-Oct 2015Foreign Cinema, San Francisco

Rinse away OOW15Eat good food

Watch foreign moviesHang with smart people

Let’s get started

top related