transit from sql to elastic search - meetupfiles.meetup.com/19156515/elasticsearch_session1.pdf ·...
TRANSCRIPT
Transit from SQL to Elastic Search By Manjunathan Raman
WHY ELASTIC SEARCH
Highly scalable open-source full-text search and analytics engine
Product Search, Log Analysis (ELK), Search As you Type, Did you mean
Usability Schema-less Default Lucene Standard Analyzer Fuzzy, Facets/aggregations, Histogram, Filter cache, date range, geo distance, boost, doc_values, paging Being fast Relevance Tuning Percolate Search
Document Mapping PUT my_index { "mappings": { "user": { "_all": { "enabled": false }, "properties": { "title": { "type": "string" }, "name": { "type": "string" }, "age": { "type": "integer" } }},"blogpost": { "properties": { "title": { "type": "string" }, "body": { "type": "string" }, "user_id": { "type": "string", "index": "not_analyzed" },"created": { "type": "date", "format": "strict_date_optional_time||epoch_millis" }}}}}
Field Types a simple type like string, date, long, double, boolean or ip.
a type which supports the hierarchical nature of JSON such as object or nested.
or a specialised type like geo_point, geo_shape, or completion.
“Index”: “not_analyzed”
“no”
Analyzer
Inverted Index
Query Related Index
curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "kimchy”,
"post_date" : "2009-11-15T14:12:12”,
"message" : "trying out Elasticsearch”}’
Get
curl -XGET 'http://localhost:9200/twitter/tweet/1'
Delete
curl -XDELETE 'http://localhost:9200/twitter/tweet/1'
Update
curl -XPUT localhost:9200/test/type1/1 -d '{
"counter" : 1,
"tags" : ["red”] }’
Multi Get
curl 'localhost:9200/_mget' -d '{
"docs" : [
{
"_index" : "test", "_type" : "type", "_id" : "1"
},{
"_index" : "test", "_type" : "type", "_id" : "2"
}] }’
Bulk API
The bulk API makes it possible to perform many index/delete operations in a single API call.
Query { "fields" : ["tpnb","name"], "query" : { "filtered": { "filter": { "bool" : { "must" : [ { "term": { "store_price.store": 2396}} ,{ "term": { "tpnb": 50006459} } ,{ "term": { "price": "1.65"} } ] } } } } }
{ "query" : { "nested" : { "path" : "stores", "query" : { "filtered": { "filter": { "bool" : { "must" : [ { "term": { "stores.store": 2396}},{ "term": { "stores.price": "1.73" }} ]}}}}}} , "sort": [ {"stores.availability": { "order": "asc","mode": "min", "nested_filter" : { "bool" : { "must" : [ { "term": { "stores.store": 2396}},{ "term": { "stores.price": "1.73" }} ]}}}},"popularity"] ,"fields" : ["tpnb"] ,"aggs" : { "storesprice" : { "nested" : { "path" : "stores"}, "aggs" : { "min_price" : { "min" : { "field" : "stores.price" } }}}}}
Distributed Storage An index should be sharded proportionally with the anticipated growth. As more nodes are added to an Elasticsearch cluster, it does a good job at reallocating and moving shards around. As such, Elasticsearch is very easy to scale out.
Each shard contains multiple "segments", where a segment is an inverted index
While you are indexing documents, Elasticsearch collects them in memory
Then every second or so, writes a new small segment to disk, and "refreshes" the search.
Key Points Write once
Query And then fetch
Query/filter And term/match
Nested, Inner Hits
Concurrency Control by document version
You can also specify the consistency level of index-operations, in terms of how many replicas must acknowledge the operation before returning
Elasticsearch has a concept of "query time" joining with parent/child-relations, and "index time" joining with nested types.
Index Vs Type
Alias – Filtered, Routing, multiple indices (array or pattern)
.scripts (groovy) - evaluated custom expression or Function score query
Mustache Template
PostMan, Head plugin
Elastic Search - Limits Architect you application with right set of DB
Relational No SQL – Graph Oriented Database
Non-relational > Denormalized > Document Oriented
Isolation > Transaction > ACID > Distribution Transactions
Nearly Real Time
Robust – costly query – cancel
Split Brain – Data Loss
Security
Different Models - Legacy
Different Model – with Elasticsearch
Java Clients for Elastic Search The Native Client – Application node client integrates with Elasticsearch Cluster, it knows the cluster state, less hope to get document from a specific shard of a node.
Jest – Light weight client, uses Elastic Rest API
Spring Data Elasticsearch – Comes with similar feel of other Spring Data Project, one step further to Jest, you can annotate data object like @Id, @Field, @Document
References https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
https://www.elastic.co/blog/found-elasticsearch-as-nosql
https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up
https://qbox.io/blog/optimizing-search-results-in-elasticsearch-with-scoring-and-boosting
https://www.elastic.co/blog/understanding-query-then-fetch-vs-dfs-query-then-fetch
https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up
http://stackoverflow.com/questions/15426441/understanding-segments-in-elasticsearch
Quiz What query language does elasticsearch use?
SQL Query DSL Query SSL ElasticClient
What filter can be used to combine multiple filters? Term Range Exists Bool
”Synonym" is an example of: Tokenizer Analyzer Token Filter Character Filter
Quiz Answers
B. Query DSL - Query DSL is Elastic search native Query Domain specific Language
D. Bool, which is to combine multiple filters using must/should
C. Token filter, which is the usage of synonym to filter based on synonyms example
E.g. “synonyms”: [“british,english”, “hen,chicken”]