using solr in online travel shopping to improve user experience
TRANSCRIPT
Using Solr in Online Travel to Improve User Experience
Sudhakar Karegowdra, Esteban Donato Travelocity, May 25TH 2011
{ sudhakar.karegowdra, esteban.donato}@travelocity.com
What We Will Cover § Travelocity § Speakers Background § Merchandising & Solr
• Challenges • Solution • Sizing and performance data • Take Away
§ Location Resolution & Solr • Challenges • Solution • Sizing and performance data • Take Away
§ Q&A 3
§ First Online Travel Agency(OTA) Launched in 1996 § Grown to 3,000 employees and is one of the largest
travel agencies worldwide § Headquartered in Dallas/Fort Worth with satellite
offices in San Francisco, New York, London, Singapore, Bangalore, Buenos Aires to name a few
§ In 2004, the Roaming Gnome became the centerpiece of marketing efforts and has become an international pop icon
§ Owned by Sabre Holdings - sister companies include Travelocity Business, IgoUgo.com, lastminute.com, Zuji among others
4
Speakers Background
§ Esteban Donato • Lead Architect Travelocity.com
§ My experience – 10 + years – Solr 2 years – Analyzing Mahout and
Carrot2 for document clustering engine.
§ Topic : Location Resolution
5
§ Sudhakar Karegowdra • Principal Architect
Travelocity.com § My experience
– 13 + years – Solr/ Lucene 3 years – Implementing Hadoop,
Pig and Hive for Data warehouse.
§ Topic : Merchandising
6
Merchandising By Sudhakar Karegowdra
The Challenge § Market Drivers
• Build Landing Pages with Faceted Navigation • Enable Content Segmentation and delivery • Support Roll out of Promotions • Roll up Data to a higher level
§ E.g., All 5 star hotels in California to bring all the 5 Star hotels from SFO,LAX, SAN etc.,
• Faster time to market new Ideas • Rapidly scale to accommodate global brands
with disparate data sources
7
The Challenge § Traditional Database approach
• Higher time to market • Specialized skill set to design and optimize
database structures and queries • Aggregation of data and changing of structures
quite complex • Building Faceted navigation capabilities needs
complex logic leading to high maintenance cost
8
Solution - Overview § Data from various sources aggregated and
ingested into Solr • Core per Locale and Product Type
§ Wrapper service to combine some data across
product cores and manage configuration rules
§ Solr’s built in Search and Faceting to power the navigation
9
Solution – Architecture View
10
Solr Master (Multi Core)
Oracle
Offer Management
Tool ETL
Services/Business Logic
UI Widgets Mobile
Deals Products ……
Solr Slaves (Multi Core)
Solution - Achievements § Millions of unique Long Tail Landing Pages
§ E.g., http://www.travelocity.com/hotel-d4980-nevada-las-vegas-hotels_5-star_business-center_green
§ Faster search across products § E.g., Beach Deals under $500
§ Segmented Content delivery through tagging § Scaled well to distribute the content to different
brands, partners and advertisers § Opened up for other innovative applications
§ Deals on Map, Deals on Mobile, Wizards etc.,
11
Solution – Road Ahead § Migration to Solr 3.1
• Geo spatial search • CSV out put format
§ Query boosting by Search pattern § Near Real time Updates § Deal and user behavior mining in Hadoop –
MapReduce and Solr to Serve the Content § Move Slaves to Cloud
12
Sizing & Performance § Index Stats
§ Number of Cores : 25 § Number of Documents : ~ 1 Million Records
§ Response § Requests : 70 tps § Average response time : 0.005 seconds (5 ms)
§ Software Versions § Solr Version 1.4.0
– filterCache size : 30000
§ Tomcat – 5.5.9 § JDK1.6
13
Take Away § Semi Structured Storage in Solr helps
aggregate disparate sources easily Remember Dynamic fields
§ Multiple Cores to manage multiple locale data
§ Solr is a great enabler of “Innovations”
14
15
Location Resolution By Esteban Donato
The Challenge § How to develop a global location resolution
service? § Flexibility to changes § General enough to cover everyone needs § Multi language § Performance and scalability § Configurable by site
16
Architecture of the solution
17
Location DB
Solr Master
Solr Slave
Management Tool
Auto-complete Resolution
Batch Job
§ Remote Streaming indexing § CSV format
§ Master/Slave architecture § Multi-core: each core represents a language § SolrJ client binary format § Solr response cache
Auto-complete § System has to suggest options as the users
type their desired location § Examples “san” => San Francisco, “veg” =>
Las Vegas § Relevancy: not all the locations are equally
important. “par” => “Paris, France”; “Parana, Argentina”
§ Users can search by various fields: location code, location name, city code, city name, state/province code, state province name, country code, country name.
18
Solr schema <dynamicField name="RANK*" type="int" required="false" indexed="true" stored="true" />
<field name="GLS_FULL_SEARCH" type="glsSearchField" required="false" indexed="true" stored="false" multiValued="true" />
<fieldType name="glsSearchField" class="solr.TextField" positionIncrementGap="100“>
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="[/\-\t ]+" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.ISOLatin1AccentFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement="" replace="all"/>
</analyzer>
</fieldType>
19
Resolution § System has to resolve the location requested
by the users. § Contemplates aliases. Big Apple => New York § Contemplates ambiguities. § Contemplates misspellings. Lomdon => London
§ NGramDistance algorithm. § How to combine distance with relevancy § Error suggesting the correct location when it is a prefix.
Lond => London
20
Spellchecker configuration <fieldType name=" spellcheckType " class="solr.TextField" positionIncrementGap="100“>
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory” />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.ISOLatin1AccentFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory" pattern="[,.]" replacement="" replace="all"/>
</analyzer>
</fieldType>
21
Sizing & Performance § 4 cores with ~ 500,000 documents indexed
each § Response times
• Auto-complete: 15ms, 20 TPS • Resolution: 10ms, 2 TPS
§ Cache configuration • queryResultCache: maxSize=1024 • documentCache, maxSize=1024 • fieldValueCache & filterCache disabled
22
Wrap Up § Performance always as top priority § Develop simple but robust services § Provide a simple API
23
Q&A
24
Contact § Esteban Donato
• [email protected] • Twitter: @eddonato
§ Sudhakar Karegowdra • [email protected] • Twitter: @skaregowdra https://www.facebook.com/travelocity Twitter: @travelocity and @RoamingGnome
25