endeca performance considerations
Post on 15-Feb-2017
1.822 Views
Preview:
TRANSCRIPT
Endeca Performance and Scalability Hard-won lessons from the field – Peter Curran, Founder Cirrus10 art by Liam Brazier, buy it here! liambrazier.com/Shop
Seattle HQ, distributed team~50 resources (25 EE + subs)All onshore laborEndeca or Oracle partner since 2010
End-to-end implementationsRelevance tuningArchitecture & process analysisProgram roadmapsUpgrades & migrations
BASICS
WHAT WE DO
Time & materialsFixed fee with risk premiumCost + bonusEasy contractsROI guarantees
OUR METHODS
~70 Endeca customersB2C and B2BCMS GurusMarquee Presenter at OOW 2014100% Referenceable
EXPERIENCE
Agenda
MDEX PerformanceUpdate PerformanceCase study: Auto Parts
ITLIndex ingestion• Forge• CAS
MDEXThe index itself• Dgraphs
AssemblerApplication interface• Service / Process
Diagram here:bit.ly/1PvJYFX
Query-time performanceThe primary consideration
Endeca performance tools
What do I need tools for?• Why did it break?• Will it break this year?
Tools1. MDEX Request Logs2. Request Log Analyzer (Cheetah)3. MDEX Perf – Load Testing (Eneperf)
Art by Liam Brazier+
What is the request log?• MDEX’s main log file – dumps every query to a log• Includes query latency and time of day
Why is it useful?• Parse it to see what the heck happened• Replay or spoof it up to answer “what if”
Where do you find it?• <working-dir>/logs/dgraphs/Dgraph1/
MDEX Request Logs
Request log analyzer (aka Cheetah)
Cheetah is an MDEX Log analysis toolReports performance statsHelps identify trendsDownloadable from Oracle
MDEXperf is a load-testing utility• Ships with Endeca
What is MDEX load testing?• Send simulated user traffic against MDEX and site• Learn how site performs under specific traffic conditions
Keys to a successful load test…• Stress system in way that represents expected production usage• Monitor performance during and after each test iteration• Test all scenarios, functionality, and technology
MDEXperf (aka ENEperf) – Load Testing
Resist the dark side
Avoid default setNavAllRefinements / allgroups=1 if possible
Exact, Phrase, and Proximity relevance ranking modules are expensiveResponse sizes > 500kbUse record filters before text searchesAvoid large flat dimensions
Art by Liam Brazier+
But the dark side is sooooo tempting …
WildcardingInteractions of large thesaurus + spelling + stemming on large datasetsFrequent Partial UpdatesNot enough physical RAM on server
Art by Liam Brazier+
Ingestion PerformanceThe primary consideration 2 years after you implement
Before we talk data ingestion…
Let’s talk sandwiches!
Is a hot dog a sandwich?Is a pizza an open-faced sandwich?Can an American city be truly great w/o a signature sandwich?
• If so: Los Angeles? Is a taco a sandwich?• New Orleans: Po’ Boy or Muffaletta?• Which city should claim the hot dog?
• Correct answer: Chicago
What happens when you index
Forge Dgidx Index Distribution
Join data sources and manipulate the data
(Step 1)Generate index file
(Step 2)
Distribute the files across Dgraph
(Step 3)
Total Index Time
Factors that might jack up your indexing time
Size of the index• 1,000,000+ records
Type of records in index• Catalog, Web Content, Social
Content, Analytical Content
Features and functionality • Store inventory, Store level pricing• Compatibility (Fitment)• Endeca Recommendations
Data Model• Wide record vs. RRN• Internationalization• Type of joins
Data Manipulations• Data cleanups - Java/Perl/XML
manipulators
Components Usage• Traditional Forge• CAS (Multi-threaded)
Two approaches for modeling complex relationships
RRNWide Records• De-normalized model• Adds store inventory to
the product record• Joins happen at indexing
• Normalized model• Inventory stored in separate
record from products• Joins happen at query time
• PRO: Fast queries• CON: Slow updates• CON: More back-end code
• PRO: Fast updates• CON: Slower-ish queries• CON: More front-end code
Indexing scars
Use a real ETL tool if you canUse record cache when joining the data sources in the pipeline. CAS is multi threaded, but it’s not as flexible as traditional ForgeBeware Forge left joinsDgidx is multi-threaded. Configure optimal threads to hasten this step.
Art by Liam Brazier+
More cuts and bruises
Use Dgidx flags carefully, specifying many pre-computed sorts can affect the performance.If index distribution time is slow, consider rolling your own approach to compress the index before distributing it
Art by Liam Brazier+
Backend performance case studyMajor Auto Parts Company
Case Study: Major auto parts company
3 major sites live since 2003Originally a bridged multi-MDEXLarge index due to fitmentRe-engineered for wide records
• <100ms MDEX response time• 3 updates/wk at many hours each• Tried partial updates but failed
Art by Liam Brazier
Wide record model
• 110,000,000 very wide records
RRN Model
• 4,500,000 narrow records
Baseline update performance
Partial update performance
Forget the session! Build your biceps!
Reception w/Bodybuilding.comOracle Open World
Tuesday 27-Oct 2015Foreign Cinema, San Francisco
Rinse away OOW15Eat good food
Watch foreign moviesHang with smart people
Let’s get started
top related