when sql is not enough - it comes elasticsearch
TRANSCRIPT
When SQL is not Enough
…it comes Elasticsearch
About me
Project Manager @
13 years professional experience
.NET Web Development MCPD
SQL Server 2012 (MCSA)
External Expert Horizon 2020
Business Interests
Web Development, SOA, Integration
Security & Performance Optimization
Contact [email protected]
www.linkedin.com/in/ivelin
www.slideshare.net/ivoandreev
2 |
Agenda
What
Why
Jump start
Analysis in depth
Side by side with SQL
Demo
What is ES
Powerful real-time search and analytics engine
“…It has a very advanced distributed model, speaks JSONnatively, and exposes many advanced search features,
all seamlessly expressed through JSON DSL…”
Shay Banon – Creator, Founder, CTO
What else… Document-oriented
Sophisticated RESTful API
Entirely open source
Based on Apache Lucene
Requires JAVA
Popularity (All DB Engines)
All DB Engines Ranking
Popularity (Search Engines)
Who Uses ES
First Steps in Elasticsearch
“You don’t learn walk by following
rules. You learn by doing”
(Richard Branson)
Terms
ElasticSearch RDBMS
Index Database
Type Table
Field Column
Document Row
Scaling
Cluster; Node; Shard (Primary/ Replica)
RESTful APIs
Document APIs
Index, Get, Update, Delete
Bulk API available
Search APIs
Send/Receive JSON
Basic queries via query string
http://localhost:9200/{indexName}/{type}/_search?q=searchstr&size=100
http://localhost:9200/{index1,index2}/{type}/_search?q=createdby:ivo
http://localhost:9200/_search?q=tag:spam
POST /[index]/[type] {
“…”,”…” }
GET /[index]/[type]/[ID] { }
PUT /[index]/[type]/[ID] {
“…”,”…” }
DELETE /[index]/[type]/[ID]
Query DSL
Entire JSON object is the Query DSL
Query
Full text queries
Results ordered by relevance
Every field is searchable
Filter
Binary – either a field matches or it does not
Filters and queries can be nested
Nesting passes relevance to parents
Query - for full-text search or for any condition
that should affect the relevance score
Filter – for everything else
How To (Filters)
ES provides 27 filters (Sep 2015)
Term/Terms filter{ "term": { "date": "2015-10-10" }}
Range filter{"range": {"age": {"gte":20, "lt":30}}}
Exists/Missing filter{"exists": {"field": "title"}}
Bool filter{"bool": {
"must": { "term": { "folder": "inbox" }},
"must_not": { "term": { "tag": "spam" }}
"should": [{ "term": { "starred": true }}, { "term": { "unread": true }}]
}}
How To (Queries)
ES provides 38 queries (Sep 2015)
match query{ "match": { "tweet": "About Search" }
multi_match query{ "multi_match": {
"query": "full text search",
"fields": [ "title", "body" ] }}
bool query{ "bool": {"must": { "match": { "title": "how to make millions" }},"must_not": { "match": { "tag": "spam" }},"should": [
{ "match": { "tag": "starred" }},{ "range": { "date": { "gte": "2014-01-01" }}}
]}}
fuzzy query
Any index search solution is way better than “LIKE”
How does SQL Full-text Index Work
Column-level language
Used by stemmers and tokenizers
Different columns for different languages
Language tags are respected (XML, binary)
Stop words
ALTER FULLTEXT STOPLIST ProductSL
ADD ‘blah' LANGUAGE 1033;
Thesaurus files
(i.e. “song”->”tune”)
Inverted Index
ES Analysis Process
Character filters Simplify data (“&” -> “and”, “ü” -> “u”)
Tokenizers Split data into words (terms, tokens)
Token filters Lowercase
Remove words w/o relevance impact (“a”, “the”)
Synonyms added
Stemming Reduce to root form (“dogs” -> “dog”)
Analyzers
FT fields are analyzed into terms to create inverted index
Configured when index is created
"Set the shape to semi-transparent by calling set_trans(5)"
Analyzer Type Example
Whitespace Set, the, shape, to, semi-transparent, by, calling, set_trans(5)
Standard (Def.) set, the, shape, to, semi, transparent, by, calling, set_trans, 5
Simple set, the, shape, to, semi, transparent, by, calling, set, trans
Stop set, the, shape, to, semi, transparent, by, calling, set, trans
Language (EN) set, shape, semi, transparent, calling, set_trans, 5
Pattern “nonword”:{ “type”: “pattern”, “pattern”:”[^\\w]+” }
Custom Allows combination of Tokenizer[1:1] and TokenFilters[0:N]
Security Remarks
RAM is Important
Data structures reside in-memory
Performance and reliability depend on it
• Be Aware
• No authentication!
• Protect private data alone
• Prevent expensive requests (DoS)
• Protect http://localhost:9200
Side by Side
ElasticSearch SQL Full-text Search
Performance RAM mainly Disk I/O mainly
Licensing Open Source Commercial
Platform Any (Java) Windows Only
Wildcards Yes Partly
FTS Syntax Rich Basic
Extensibility Plugins CLR or custom code
Scale Out Yes No
Relational Integrity No Yes
Security No Yes
FT Search Setup Manual Wizard
Index Update Manual Auto
From SQL to Elasticsearch
Rivers (deprecated)
Logstash
Open source log management tool
Client libraries
.NET
Elasticsearch.Net
Nest
Also Java, JS, Perl, Python, Ruby, PHP
Summary
Not a replacement of RDBMS
Real-time search applications
Built for scalability
Easy to install
RESTful API and JSON
Deployment (Windows)
Install Java
Download ES zip
Install [ESHome]/bin> service install
Set ES service to start automatically [ESHome]/bin> service manager
Open in browser http://localhost:9200/
Plugin Install [ESHome]/bin> plugin -i elasticsearch/marvel/latest
Restart ES
Takeaways
Tools Kopf: https://github.com/lmenezes/elasticsearch-kopf
Marvel: https://www.elastic.co/products/marvel
Curl: http://curl.haxx.se/download.html
JDBC Driver: http://www.java2s.com/Code/Jar/s/Downloadsqljdbc430jar.htm
Community
https://discuss.elastic.co
Getting Started
http://joelabrahamsson.com/elasticsearch-101/
Sponsors