when sql is not enough - it comes elasticsearch

28
When SQL is not Enough it comes Elasticsearch

Upload: ivo-andreev

Post on 13-Apr-2017

395 views

Category:

Software


0 download

TRANSCRIPT

Page 1: When SQL is not Enough - It comes Elasticsearch

When SQL is not Enough

…it comes Elasticsearch

Page 2: When SQL is not Enough - It comes Elasticsearch

About me

Project Manager @

13 years professional experience

.NET Web Development MCPD

SQL Server 2012 (MCSA)

External Expert Horizon 2020

Business Interests

Web Development, SOA, Integration

Security & Performance Optimization

Contact [email protected]

www.linkedin.com/in/ivelin

www.slideshare.net/ivoandreev

2 |

Page 3: When SQL is not Enough - It comes Elasticsearch

Agenda

What

Why

Jump start

Analysis in depth

Side by side with SQL

Demo

Page 4: When SQL is not Enough - It comes Elasticsearch

What is ES

Powerful real-time search and analytics engine

“…It has a very advanced distributed model, speaks JSONnatively, and exposes many advanced search features,

all seamlessly expressed through JSON DSL…”

Shay Banon – Creator, Founder, CTO

What else… Document-oriented

Sophisticated RESTful API

Entirely open source

Based on Apache Lucene

Requires JAVA

Page 6: When SQL is not Enough - It comes Elasticsearch

Popularity (Search Engines)

Page 7: When SQL is not Enough - It comes Elasticsearch

Who Uses ES

Page 8: When SQL is not Enough - It comes Elasticsearch

First Steps in Elasticsearch

“You don’t learn walk by following

rules. You learn by doing”

(Richard Branson)

Page 9: When SQL is not Enough - It comes Elasticsearch

Terms

ElasticSearch RDBMS

Index Database

Type Table

Field Column

Document Row

Scaling

Cluster; Node; Shard (Primary/ Replica)

Page 10: When SQL is not Enough - It comes Elasticsearch

RESTful APIs

Document APIs

Index, Get, Update, Delete

Bulk API available

Search APIs

Send/Receive JSON

Basic queries via query string

http://localhost:9200/{indexName}/{type}/_search?q=searchstr&size=100

http://localhost:9200/{index1,index2}/{type}/_search?q=createdby:ivo

http://localhost:9200/_search?q=tag:spam

POST /[index]/[type] {

“…”,”…” }

GET /[index]/[type]/[ID] { }

PUT /[index]/[type]/[ID] {

“…”,”…” }

DELETE /[index]/[type]/[ID]

Page 11: When SQL is not Enough - It comes Elasticsearch

Query DSL

Entire JSON object is the Query DSL

Query

Full text queries

Results ordered by relevance

Every field is searchable

Filter

Binary – either a field matches or it does not

Filters and queries can be nested

Nesting passes relevance to parents

Page 12: When SQL is not Enough - It comes Elasticsearch

Query - for full-text search or for any condition

that should affect the relevance score

Filter – for everything else

Page 13: When SQL is not Enough - It comes Elasticsearch

How To (Filters)

ES provides 27 filters (Sep 2015)

Term/Terms filter{ "term": { "date": "2015-10-10" }}

Range filter{"range": {"age": {"gte":20, "lt":30}}}

Exists/Missing filter{"exists": {"field": "title"}}

Bool filter{"bool": {

"must": { "term": { "folder": "inbox" }},

"must_not": { "term": { "tag": "spam" }}

"should": [{ "term": { "starred": true }}, { "term": { "unread": true }}]

}}

Page 14: When SQL is not Enough - It comes Elasticsearch

How To (Queries)

ES provides 38 queries (Sep 2015)

match query{ "match": { "tweet": "About Search" }

multi_match query{ "multi_match": {

"query": "full text search",

"fields": [ "title", "body" ] }}

bool query{ "bool": {"must": { "match": { "title": "how to make millions" }},"must_not": { "match": { "tag": "spam" }},"should": [

{ "match": { "tag": "starred" }},{ "range": { "date": { "gte": "2014-01-01" }}}

]}}

fuzzy query

Page 15: When SQL is not Enough - It comes Elasticsearch

Any index search solution is way better than “LIKE”

Page 16: When SQL is not Enough - It comes Elasticsearch

How does SQL Full-text Index Work

Column-level language

Used by stemmers and tokenizers

Different columns for different languages

Language tags are respected (XML, binary)

Stop words

ALTER FULLTEXT STOPLIST ProductSL

ADD ‘blah' LANGUAGE 1033;

Thesaurus files

(i.e. “song”->”tune”)

Page 17: When SQL is not Enough - It comes Elasticsearch

Inverted Index

Page 18: When SQL is not Enough - It comes Elasticsearch

ES Analysis Process

Character filters Simplify data (“&” -> “and”, “ü” -> “u”)

Tokenizers Split data into words (terms, tokens)

Token filters Lowercase

Remove words w/o relevance impact (“a”, “the”)

Synonyms added

Stemming Reduce to root form (“dogs” -> “dog”)

Page 19: When SQL is not Enough - It comes Elasticsearch

Analyzers

FT fields are analyzed into terms to create inverted index

Configured when index is created

"Set the shape to semi-transparent by calling set_trans(5)"

Analyzer Type Example

Whitespace Set, the, shape, to, semi-transparent, by, calling, set_trans(5)

Standard (Def.) set, the, shape, to, semi, transparent, by, calling, set_trans, 5

Simple set, the, shape, to, semi, transparent, by, calling, set, trans

Stop set, the, shape, to, semi, transparent, by, calling, set, trans

Language (EN) set, shape, semi, transparent, calling, set_trans, 5

Pattern “nonword”:{ “type”: “pattern”, “pattern”:”[^\\w]+” }

Custom Allows combination of Tokenizer[1:1] and TokenFilters[0:N]

Page 20: When SQL is not Enough - It comes Elasticsearch

Security Remarks

RAM is Important

Data structures reside in-memory

Performance and reliability depend on it

• Be Aware

• No authentication!

• Protect private data alone

• Prevent expensive requests (DoS)

• Protect http://localhost:9200

Page 21: When SQL is not Enough - It comes Elasticsearch

Side by Side

ElasticSearch SQL Full-text Search

Performance RAM mainly Disk I/O mainly

Licensing Open Source Commercial

Platform Any (Java) Windows Only

Wildcards Yes Partly

FTS Syntax Rich Basic

Extensibility Plugins CLR or custom code

Scale Out Yes No

Relational Integrity No Yes

Security No Yes

FT Search Setup Manual Wizard

Index Update Manual Auto

Page 22: When SQL is not Enough - It comes Elasticsearch

From SQL to Elasticsearch

Rivers (deprecated)

Logstash

Open source log management tool

Client libraries

.NET

Elasticsearch.Net

Nest

Also Java, JS, Perl, Python, Ruby, PHP

Page 23: When SQL is not Enough - It comes Elasticsearch

Summary

Not a replacement of RDBMS

Real-time search applications

Built for scalability

Easy to install

RESTful API and JSON

Page 24: When SQL is not Enough - It comes Elasticsearch
Page 25: When SQL is not Enough - It comes Elasticsearch

Deployment (Windows)

Install Java

Download ES zip

Install [ESHome]/bin> service install

Set ES service to start automatically [ESHome]/bin> service manager

Open in browser http://localhost:9200/

Plugin Install [ESHome]/bin> plugin -i elasticsearch/marvel/latest

Restart ES

Page 26: When SQL is not Enough - It comes Elasticsearch

Takeaways

Tools Kopf: https://github.com/lmenezes/elasticsearch-kopf

Marvel: https://www.elastic.co/products/marvel

Curl: http://curl.haxx.se/download.html

JDBC Driver: http://www.java2s.com/Code/Jar/s/Downloadsqljdbc430jar.htm

Community

https://discuss.elastic.co

Getting Started

http://joelabrahamsson.com/elasticsearch-101/

Page 27: When SQL is not Enough - It comes Elasticsearch

Sponsors

Page 28: When SQL is not Enough - It comes Elasticsearch