elasticsearch basics
TRANSCRIPT
Why we need SEARCH ENGINE!!!
Work Life without Google ?
Amazon,Flipkart,Facebook,Twitter,GitHub,
StackOverflow,Zomato don’t have a search capability ?
You know, for search …
Lot of DATA around us but less INFORMATION.
Make our life easier.
Find relevant stuff.
Find it faster.
For research,to shop,for entertainment etc
Elasticsearch !!!
Real-time(near real time) distributed search & analytics engine
Runs on top of Apache Lucene, written in Java,supports REST API
Search Engine Software
Private search engine service (like a Bing or a Google) but with say, private, sensitive, or confidential data/documents that you don’t want on the public web
Developed by Shay Banon
ElasticSearch Concepts
Document : JSON document stored in ES. Like row in table in Relational DB
Id : Uniquely identifies a document
Field : key-value pairs. Like column in Relational DB
- Simple value like string ,integer, date
- Array or an object
Type : Like a table in realational DB.Has list of fields.
ElasticSearch Concepts
Near RealTime(nrt) : Slight time lag between index a document and being searchable.
Shard : Low level worker unit,Single lucene instance
- Primary Shard (Physically stored document)
- Replica Shard (Copy of primary shard)
Index : Like Database in relational db,Logical namespace which maps to primary and replica shard
Node : Running instance of elasticsearch
ElasticSearch Concepts
Cluster : Collection of one or more nodes
- Facilitates indexing
- Search capabilities across nodes
ElasticSearch Getting Started …
Recent version of Java
elasticsearch.org/download
Latest version of any browser
Marvel & Sense
Marvel : monitoring and management tool
Sense : interactive console
Talking to ElasticSearch
RESTful api json over http
A request to Elasticsearch consists of the same parts as any HTTP request:
curl -X<VERB> '<PROTOCOL>://<HOST>/<PATH>' -d '<BODY>‘
curl -X<VERB> '<PROTOCOL>://<HOST>/<PATH>?<QUERY_STRING>'
Document Oriented
Stores entire objects or documents. It not only stores them, but also indexes the contents of each document in order to make them searchable
Elasticsearch uses JavaScript Object Notation, or JSON,
{"email": "[email protected]","first_name": "John","last_name": "Smith","info": {
"bio": "Eco-warrior and defender of the weak","age": 25,"interests": [ "dolphins", "whales" ]
},"join_date": "2014/05/01"
}
Create Index,Insert Data …
Cluster : myelasticsearch
Node : The Dark Knight
Index : Megacorp
Type : Employee
A request to Elasticsearch consists of the same parts as any HTTP request(Using Sense):
<VERB> '<PROTOCOL>://<HOST>/<INDEX>/<TYPE>/ID ' '<BODY>‘
<VERB> '<PROTOCOL>://<HOST>/<PATH><INDEX>/<TYPE>/ID /?<QUERY_STRING>'
Create Index Example :
PUT localhost:9201/megacorp/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
What is Inverted Index?
It allows very fast full text search.
Doc_1 : The quick brown fox jumped over the lazy dog
Doc_2 : Quick brown foxes leap over lazy dogs in summer.
Analysis and Analyzers
Character filters A character filter could be used to strip out HTML, or to convert &characters to the word and.
Tokenizer Next, the string is tokenized into individual terms by a tokenizer.
Token filters Last, each term is passed through any token filters in turn, which can change terms (for example, lowercasing Quick), remove terms (for example, stopwords such as a, and, the) or add terms (for example, synonyms like jump and leap).
Examples: Standard Analyzer,Simple Analyzer,Whitespaceanalyzer,Language Analyzers etc
Inverted Index After Analysis
Doc_1 : The quick brown fox jumped over the lazy dog
Doc_2 : Quick brown foxes leap over lazy dogs in summer.
Quick can be lowercased to become quick.
foxes can be stemmed--reduced to its root form—to
Become fox. Similarly, dogs could be stemmed to dog.
jumped and leap are synonyms and can be indexed as just
the single term jump.
Retrieve Using Query String (SearchLite)…
Retrieve Example :
GET localhost:9201/megacorp/employee/1
Query String Search Example :
GET /megacorp/employee/_search?q=last_name:Smith
Retrieve Using DSL …
Query DSL Search
Build complicated and robust queries.
The domain-specific language (DSL) is specified using a JSON request body.
Example :
GET /megacorp/employee/_search{
"query": {
"match": {
"last_name": "Smith"
}
}
}
Full Text Search
Elasticsearch can search within full-text fields and return the most relevant results first. This concept of relevance is important to Elasticsearch, and is a concept that is completely foreign to traditional relational databases, in which a record either matches or it doesn’t.