using joomla, zoo & solr to power asia's largest auction house
DESCRIPTION
This presentation is a walk through of our adventures in integrating various aspects on Joomla, 3PD extensions & SOLR. The highlight in this presentation is the use of Apache SOLR to create a responsive, filtered, sortable, searchable 'image grid' with continuous pagination. This behaves a lot like Google's image search where you can keep on scrolling to get more results.TRANSCRIPT
SOLR + Joomla powering the catalog of Asia's Largest Auction House
Parth LawateStrategic Marketing Manager Joomla
CEO, Techjoomla, Tekdi Web Solutions
@parthlawate, @techjoomla
www.techjoomla.com
Cook
Bookworm Gardener
JUG Pune
Joomla Freak
Trekking
EntrepreneurJoomla Day India
Open Source
Software ArchitechtMarketing
Content Strategy
Hiking
Tekdi Technologies Pvt. Ltd.@tekdinet
tekdi.net
IOS Apps
CRM
Magento
E Learning Ecommerce
Joomla
Custom Apps
Android
CMS
HTML5
Social Networks
Techjoomla. For All things Joomla @techjoomla
techjoomla.com
jGive
People Suggest
jomLikeJTicketing
J!Bolo
Broadcast
Invitex
Email Beautifier
SocialAds
J!MailAlerts
REST APIPayments API
Social API
Quick2Cart
Quick Facts
● The client is a major Art &
Auction house in India & is
one of the largest in Asia.
● Data collation over a
period of 20+ years
● Over 500,000 records with
complex interrelations.
Quick Facts
● Complex data structure
with 100+ parameters
/fields on each data type
● Graphics Heavy – All
artifacts have High
resolution images
The Technical Challenge
● Over 100,000 records in
the first phase of migration
● Extremely complex data
relations
● Complex Data types &
Record parameter volume
& complexity
The Human Challnge
● Use of MS Excel for years to manage
their knowledge base before we
came on board
● Working with the client's research &
archivist team who had almost no
knowledge of any kind of web
technologies
● Getting the team of traditional
archivists to adopt a modern system.
The Solution
● The data complexity &
relations called for using a
CCK
● We Chose Zoo to serve as a
base for all the
customisations to come
● Custom apps based on this
architechture.
Term Glossary
● Classification – First level cartegorisation eg. ANTQ
● Sub Classification – arm
● Artifact – Actual Record
● Masterlists – Records that can be used as Associated
records or as a link between 2 or more records
Starting Small
● 9 Classifications
● 50 + Subsclassifications
● 50,000 Artifacts
The Work with Zoo
● Custom field types
● Custom association
plugins in order to create
records from relations
● Custom views
The Early Search
● Custom extension for
parametric search
● One table per classificaton
● CRON based indexer
● MySQL powered with Natural
language support
● Using MySQL soundex for
'did you mean' feature
We Want Excel !
● Though we got the archivists to use web forms.. they still missed
the ease of excel
● So we gave it to them ! With Hanson table based Mass Edit view
for Zoo.
Bulk Processing's gotta be there !
● Bulk Edit
● Bulk Delete
● Bulk Add
● Custom Importing Tools
with volume processing &
automapping
The Data today
● 12 Classifications
● 100 + sub classifications
● 8 Masterlists
● 200,000 artifacts
The Baby's growing up !
More Data called for an architechture upgrade
Need for a better search
● 200,000 Records
● Zoo Data structure isnt
optimised for search
● MYSQL based indexer would
hit limits down the line.
Unions across 9 tables (which
could increase) would make
it slower
Need for a better search...
● Single & 2 letter
autosuggest not supported
by Mysql (3 char min limit
for LIKE)
● Normal search was not as
fast as expected (Brought
down load time for ~0.8secs
to 0.3 secs)
Getting the data ready for SOLR
● MYSQL Indexer from
earlier phase modified to
create a Data normaliser
to push data to SOLR
● CLI script that reads
records to populate
SOLR index
● Using the PHP-SOLR
library
Osianama.com
PHP-SOLR Library
Browser
SOLR
Main Index
SuggestionsIndex
planned
Getting SOLR into the picture
● Custom Search replaced
by SOLR
● SOLR hosted on Separate
Amozon instance
● Initial Implementation
was only for search
Benefits
● Much better natural
language search,
● Better relevance scoring
● Full reindex everyday
● Even browsing is now
SOLR powered which
means MORE SPEED !
● Record counts per category
& sub-category easily
achieved using faceting
● Now using SOLR's suggester
module
● Using separate 'cores' for
main index and suggest
terms index
Whats coming ?
● Autosuggest directly works off
SOLR (currently piped through
PHP)
● Implement delta indexing,
currently not implemented due
to multitude of relational data.
● Change in a bottom level record
needs to flow through to all
associations
What else is so
awesome about
this ?
HTML5 Local Storage
● HTML5 Local storage is
being used to cache data
locally & load used data
faster
● Sets the road for offline
use in the future !
Google Image Search anyone ?
● Ajax Grid pagination like Google Images
● Preloading & caching of images, CDN backed delivery
IOS App for IPad
● Powered by RESTful
Webservices writen on top
of Joomla using com_api
● Initial version developed in
HTML5+Cordova (Phonegap)
● Supports offline use of
alredy viewed data
Even More !The Project is under continuous development. The features here only cover development at the point this
presentation was made.
● Online Sale of Images ,
Downloads & Rights
Managements
● Research & Teaching tools
● Social Network
● Subscription based
privileged access
Thank You !
● Questions ?
● Interested in developing something similar ?
Drop us an email ! [email protected]
Twitter @techjoomla | @parthlawate