Download - Using Sphinx for Search in PHP
![Page 1: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/1.jpg)
Using Sphinx for Search
Mike Lively Slickdeals, LLC
![Page 2: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/2.jpg)
What is Sphinx?• A full-text search engine
• Quickly get high quality (relevant) results
• Designed to integrate well with SQL RDBMS
• Can work with any data source
• Can be queried using either an API or SQL
![Page 3: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/3.jpg)
How do I know anything about Sphinx?
• Manager of Software Architecture for Slickdeals.net
• Alexa top 150 site (in the US)
• Have been working at improving our Sphinx search engine for the last 2 months or so.
• Over 7 Million searches a month directly through the interface, lots more happen indirectly.
![Page 4: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/4.jpg)
When should I use Sphinx?
• Site / Product / Document searches
• Auto-suggest / Auto-Correct functionality
• Finding relevant and related items
![Page 5: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/5.jpg)
Simple Architecture
• Often, search is offloaded straight to the database
• Search goes to the backend which performs queries on the database
• Obviously very easy to implement
![Page 6: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/6.jpg)
Simple Architecture• Simple “starts with” searches
on indexed fields can sometimes work: `city` LIKE ‘Las%’
• Anything else will lock your database for writes with MyISAM.
• MySQL is not a great or flexible full text engine
• It can sometimes be adequate
![Page 7: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/7.jpg)
Sphinx Architecture• Searchd is responsible for
receiving requests from clients and executing the searches against the sphinx index.
• Indexer is responsible for getting data into the sphinx index.
• This separation allows indexing and searching to be scaled separately.
![Page 8: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/8.jpg)
Sphinx Architecture• Searchd has a binary protocol
for which there are several clients available in multiple languages.
• Searchd is also binary compatible with MySQL’s protocol since mysql 4.1
• Searchd is a daemon that runs on your search servers
![Page 9: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/9.jpg)
Sphinx Architecture
• Indexer is a shell program that you can execute to build any number of indexes.
• Can handle index rotation for live indexing
![Page 10: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/10.jpg)
Not So Quick Side NoteMySQL IS SLOWWWWWWWWWWWWW
(at text matches)
![Page 11: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/11.jpg)
Still Not Quick Side NoteIndexes won’t help you…
![Page 12: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/12.jpg)
Quicker Side NoteFull Text Search isn’t so bad
IF….
![Page 13: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/13.jpg)
Sphinx Concepts
• Sphinx Indexes “Documents”
• Each document has a unique unsigned, non-zero integer ID (either 32 bit or 64 bit space)
• Each document has one or more fields
• Each document has zero or more attributes
![Page 14: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/14.jpg)
Indexes / Sources• Sphinx indexes are created from one or more
sources.
• The source can be a database, xml, or tsv stream.
• You can use multiple sources
• This is useful for maintaining updated indexes
• Also used to implement a sphinx cluster
![Page 15: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/15.jpg)
Sphinx Fields• Fields are what the full text index is comprised of.
• When searching you can search against any number of fields.
• You can assign different relevancy weights to different fields.
• The original value of a field is never stored by Sphinx.
• You should always have at least one.
![Page 16: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/16.jpg)
Sphinx Attributes
• data that helps further describe the item being indexed
• Can be returned as a part of the search
• Useful for filtering and sorting results
• These are not a part of the full text index.
![Page 17: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/17.jpg)
MySQL Full Text Search
• You can get away with MyISAM tables or as of version 5.6 InnoDB.
• You don’t care about morphology (think plurals)
• You don’t need anything but the most basic of search operators
![Page 18: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/18.jpg)
Creating An Index
• We are going to add an index that sources a mysql database.
• The data being sourced is a list of the titles of wikipedia posts.
![Page 19: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/19.jpg)
Creating An Index
![Page 20: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/20.jpg)
Indexer Configuration
• We are going to be peaking into a sphinx configuration file now.
• You can rebuild the config file by concatenating each section into a single file.
• On my VM this file is located in /usr/local/etc/sphinx.conf
![Page 21: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/21.jpg)
Source Definition
![Page 22: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/22.jpg)
Source DefinitionDefines the connection information
![Page 23: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/23.jpg)
Connection information
• Ideally, you should create a separate account for sphinx
• You can also connect via unix socket
• I didn’t specify it here, but you can also add a port.
![Page 24: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/24.jpg)
Source DefinitionThe query that pulls data to populate the index
![Page 25: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/25.jpg)
Source Index• The index query MUST return
the id field as the first column
• Remember, the id needs to be a unique, unsigned 64 bit (or less number)
• The query must be on a single line. Unless you escape new lines with back slashes.
• Notice that we converted the timestamp into a unix timestamp. That is important.
![Page 26: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/26.jpg)
Source DefinitionHow data is stored in the index
![Page 27: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/27.jpg)
Source Fields• The first column in the query is
always the ID.
• You specify any columns that are attributes.
• Remember, attributes are stored in the index as fields that can be used to filter and sort by.
• Any field besides the id that is not specified as an attribute, is assumed to be a text field (title)
![Page 28: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/28.jpg)
Index Definition
![Page 29: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/29.jpg)
Index Definition• An Index includes one or
more sources.
• Each source gets it’s own “source” line
• Multiple sources must all define the same fields and attributes.
• The ids need to be unique across resources
![Page 30: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/30.jpg)
Index Definition• path is not actually a path, it’s
a filename with no extension.
• docinfo dictates if attributes are stored in the index or outside of the index.
• dict is not really important now. Used to be either crc or keywords. Now crc is deprecated.
• min_word_len is the minimum length of words to index
![Page 31: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/31.jpg)
Rest of the Index Configuration
![Page 32: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/32.jpg)
It’s time to build the indexindexer <index name>
![Page 33: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/33.jpg)
Searching the Index
• searchd is the daemon that searches the index
• Binary ProtocolOR
• MySQL Compatible too!
![Page 34: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/34.jpg)
searchd configIncluded in the same config file as the rest
![Page 35: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/35.jpg)
Spinning up searchd
![Page 36: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/36.jpg)
–Sphinx
“I know MySQL”
![Page 37: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/37.jpg)
MySQL Compatible
![Page 38: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/38.jpg)
MySQL Compatible
• Tables == Indexes
• SHOW TABLES…Shows indexes.
• Select * From <index> works too.
![Page 39: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/39.jpg)
Selecting from an index
![Page 40: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/40.jpg)
Querying Indexes
• Default limit of 20 rows
• Notice the text fields are not returned…
• They would be if we made them attributes (sql_field_string)
![Page 41: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/41.jpg)
Querying Indexes
• The magic function in SphinxQL is match()
• match() performs a full text search against the entire index…usually
• The ‘@field’ operator can isolate which field is searched on.
![Page 42: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/42.jpg)
Querying Indexes
• You can query against attributes
• You can sort results
• You can use the weight() function to determine relevancy.
![Page 43: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/43.jpg)
Querying Indexes
• The 25387283 title was more relevant because it matched on the term “testing”
![Page 44: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/44.jpg)
Getting PHP into the mix
• All we need? PDO.
• We will build a basic search page
• Accepts a query, displays up to 100 matching results by relevancy with the matching keywords highlighted.
![Page 45: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/45.jpg)
![Page 46: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/46.jpg)
Pulling data from Sphinx
![Page 47: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/47.jpg)
Fetching the data from Mysql
![Page 48: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/48.jpg)
Adding the fancy yellow highlighting
![Page 49: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/49.jpg)
The rest is pretty basic…
![Page 50: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/50.jpg)
Cool things we would talk about if I had like…3 more hours
• Auto-suggest, Auto-correct
• More on lemmatization and stemming
• Distributed Sphinx Clustering
• Delta indexes
• Real Time Indexes
• The plethora of operators you can use
• Ranged Queries
• ………
![Page 51: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/51.jpg)
Additional Information
• The sphinx documentation is actually pretty great
• http://sphinxsearch.com/docs/
• Slides are already on Slideshare
• Will link them to the meet up shortly
![Page 52: Using Sphinx for Search in PHP](https://reader033.vdocuments.us/reader033/viewer/2022051311/540d5fb78d7f728d7e8b48d1/html5/thumbnails/52.jpg)
Questions?