ferret
DESCRIPTION
Introduction to Ferret, the Ruby Full-Text Search EngineTRANSCRIPT
![Page 1: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/1.jpg)
FerretA Ruby Search Engine
Brian Sam-Bodden
![Page 2: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/2.jpg)
Agenda
• What is Ferret?
• Concepts
• Fields
• Indexing
• Installing Ferret
![Page 3: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/3.jpg)
Agenda
• The Recipe
• Documents
• Ferret::Index::Index
• FQL
• Ferret in you App
![Page 4: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/4.jpg)
Agenda
• Ferret in Rails
• Resources
![Page 5: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/5.jpg)
What is Ferret?
• Information Retrieval (IR) Library
• Full-featured Text Search Engine
• Inspired on the Search Engine
• Port to Ruby by David Balmain
![Page 6: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/6.jpg)
What is Ferret?
• Initially a 100% pure Ruby port
• Since 0.9 many core functions are implemented in C
• Fast! Now Faster than Lucene ;-)
![Page 7: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/7.jpg)
Concepts
![Page 8: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/8.jpg)
Concepts
• Index : Sequence of documents
![Page 9: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/9.jpg)
Concepts
• Index : Sequence of documents
• Document : Sequence of fields
![Page 10: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/10.jpg)
Concepts
• Index : Sequence of documents
• Document : Sequence of fields
• Field : Named sequence of terms
![Page 11: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/11.jpg)
Concepts
• Index : Sequence of documents
• Document : Sequence of fields
• Field : Named sequence of terms
• Term : A text string, keyed by field name
![Page 12: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/12.jpg)
Fields of a Document in an Index
![Page 13: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/13.jpg)
Fields of a Document in an Index
• Fields are individually searchable units that are:
![Page 14: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/14.jpg)
Fields of a Document in an Index
• Fields are individually searchable units that are:
• Stored: The original Terms of the fields are store
![Page 15: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/15.jpg)
Fields of a Document in an Index
• Fields are individually searchable units that are:
• Stored: The original Terms of the fields are store
• Indexed: Inverted to rapidly find all Documents containing any of the Terms
![Page 16: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/16.jpg)
Fields of a Document in an Index
• Fields are individually searchable units that are:
• Stored: The original Terms of the fields are store
• Indexed: Inverted to rapidly find all Documents containing any of the Terms
• Tokenized: Individual Terms extracted are indexed
![Page 17: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/17.jpg)
Fields of a Document in an Index
• Fields are individually searchable units that are:
• Stored: The original Terms of the fields are store
• Indexed: Inverted to rapidly find all Documents containing any of the Terms
• Tokenized: Individual Terms extracted are indexed
• Vectored: Frequency and location of Terms are stored
![Page 18: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/18.jpg)
It’s all about Indexing
• Indexing is the processing of a source document into plain text tokens that Ferret can manipulate
• For any non-plaintext sources such as PDF, Word, Excel you need to:
• Extract
• Analyze
![Page 19: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/19.jpg)
Installing Ferret
![Page 20: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/20.jpg)
Installing Ferret
gem install ferret
![Page 21: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/21.jpg)
Installing Ferret
![Page 22: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/22.jpg)
Installing Ferret
![Page 23: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/23.jpg)
Installing Ferret
}
![Page 24: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/24.jpg)
Installing Ferret
}Pick the latest version for your platform
![Page 25: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/25.jpg)
The Recipe
![Page 26: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/26.jpg)
The Recipe
1. Create some Documents
![Page 27: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/27.jpg)
The Recipe
1. Create some Documents
2. Create an Index
![Page 28: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/28.jpg)
The Recipe
1. Create some Documents
2. Create an Index
3. Adding Documents to the Index
![Page 29: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/29.jpg)
The Recipe
1. Create some Documents
2. Create an Index
3. Adding Documents to the Index
4. Perform some Queries
![Page 30: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/30.jpg)
Example DocumentsCreate some Documents
![Page 31: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/31.jpg)
Example DocumentsCreate some Documents
“Any String is a Document”
![Page 32: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/32.jpg)
Example DocumentsCreate some Documents
![Page 33: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/33.jpg)
Example DocumentsCreate some Documents
[“This”, “is also”, “a document”]
![Page 34: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/34.jpg)
Example DocumentsCreate some Documents
![Page 35: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/35.jpg)
Example DocumentsCreate some Documents
![Page 36: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/36.jpg)
Ferret::Index::IndexCreate an Index
![Page 37: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/37.jpg)
Ferret::Index::Index
• Indexes are encapsulated by the class
Create an Index
![Page 38: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/38.jpg)
Ferret::Index::Index
• Indexes are encapsulated by the class
➡ Ferret::Index::Index
Create an Index
![Page 39: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/39.jpg)
Ferret::Index::Index
• Indexes are encapsulated by the class
➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
Create an Index
![Page 40: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/40.jpg)
Ferret::Index::Index
• Indexes are encapsulated by the class
➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
Create an Index
![Page 41: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/41.jpg)
Ferret::Index::Index
• Indexes are encapsulated by the class
➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
➡ index = Ferret::I.new(:path = > ‘/somepath’)
Create an Index
![Page 42: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/42.jpg)
Ferret::Index::Index
• Indexes are encapsulated by the class
➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
➡ index = Ferret::I.new(:path = > ‘/somepath’)
• Or, completely in Memory
Create an Index
![Page 43: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/43.jpg)
Ferret::Index::Index
• Indexes are encapsulated by the class
➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
➡ index = Ferret::I.new(:path = > ‘/somepath’)
• Or, completely in Memory
➡ index = Ferret::I.new()
Create an Index
![Page 44: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/44.jpg)
Ferret::Index::Index
• Index provides the add_document method
• It also provides the << alias
• Adding documents is then as easy as:
➡ index << “This is a document”
➡ index << {:first => “Bob”, :last => “Smith”}
Adding Documents to the Index
![Page 45: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/45.jpg)
Ferret::Index::IndexPerform some Queries
![Page 46: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/46.jpg)
Ferret::Index::Index
• Index provides the search and search_each methods
Perform some Queries
![Page 47: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/47.jpg)
Ferret::Index::Index
• Index provides the search and search_each methods
• search method takes a query and a an optional set of parameters:
Perform some Queries
![Page 48: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/48.jpg)
Ferret::Index::Index
• Index provides the search and search_each methods
• search method takes a query and a an optional set of parameters:
➡ search(query, options = {})
Perform some Queries
![Page 49: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/49.jpg)
Ferret::Index::Index
• Index provides the search and search_each methods
• search method takes a query and a an optional set of parameters:
➡ search(query, options = {})
• The search_each method provides an iterator block
Perform some Queries
![Page 50: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/50.jpg)
Ferret::Index::Index
• Index provides the search and search_each methods
• search method takes a query and a an optional set of parameters:
➡ search(query, options = {})
• The search_each method provides an iterator block
➡ search_each(query, options = {}) {|doc, score| ... }
Perform some Queries
![Page 51: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/51.jpg)
Playing with Ferret in irb
![Page 52: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/52.jpg)
Playing with Ferret in irb
![Page 53: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/53.jpg)
Playing with Ferret in irb
![Page 54: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/54.jpg)
Playing with Ferret in irb
![Page 55: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/55.jpg)
Playing with Ferret in irb
![Page 56: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/56.jpg)
Playing with Ferret in irb
![Page 57: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/57.jpg)
Playing with Ferret in irb
![Page 58: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/58.jpg)
Playing with Ferret in irb
![Page 59: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/59.jpg)
Ferret Query Language
• Ferret own Query Language, FQL is a powerful way to specify search queries
• FQL supports many query types, including:
• Term• Phrase• Field• Boolean
• Range• Wild• Fuzz
![Page 60: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/60.jpg)
Index.explain
• The explain method of Index describes how a document score against a query
• Very useful for debugging
• and for learning how Ferret works
![Page 61: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/61.jpg)
Index.explain
![Page 62: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/62.jpg)
Ferret in your App
File System
Gather Data
Database Web
Manual Input
Ap
pli
cati
onF
erre
t
User
Get User’s Query
Present Search Results
Index Documents Search Index
Index
![Page 63: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/63.jpg)
Ferret in Rails
• Acts As Ferret is an ActiveRecord extension
• Available as a plugin
• Provides a simplified interface to Ferret
• Maintained by Jens Kramer
![Page 64: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/64.jpg)
Ferret in Rails
• Adding an index to an ActiveRecord model is as simple as:
![Page 65: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/65.jpg)
Ferret in Rails
• Adding an index to an ActiveRecord model is as simple as:
![Page 66: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/66.jpg)
Ferret in Rails• Simple model has two searchable
fields title and body:
![Page 67: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/67.jpg)
Ferret in Rails
• After a quick rake db:migrate we now have some data to play with
• Fire up the Rails Console and let’s see what acts_as_ferret can do for our models
![Page 68: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/68.jpg)
Ferret in Rails
![Page 69: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/69.jpg)
Want more?
• Ferret is improving constantly
• Acts As Ferret seems to catch up quickly
• Real-life usage seems to require some good engineering on your part
• Background indexing
• Hot swap of indexes?
![Page 70: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/70.jpg)
Want more?
• We only covered the simplest constructs in Ferret
• Ferret’s API provides enough flexibility for the most demanding searching needs
![Page 71: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/71.jpg)
Online Resources
• http://ferret.davebalmain.com
• http://lucene.apache.org
• http://lucenebook.com
• http://projects.jkraemer.net/acts_as_ferret
![Page 72: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/72.jpg)
In-Print Resources
![Page 73: Ferret](https://reader035.vdocuments.us/reader035/viewer/2022081602/55509874b4c90590208b470f/html5/thumbnails/73.jpg)
Thanks!