your data, your search, elasticsearch

56
© 2013 SpringOne 2GX. All rights reserved. Do not distribute without permission. Your data, your search, Elasticsearch Costin Leau @costinl

Upload: spring-io

Post on 06-May-2015

1.999 views

Category:

Technology


1 download

DESCRIPTION

Speaker: Costin Leau Finding relevant information fast has always been a challenge, even more so in today's growing "oceans" of data. This talk explores the area of real-time full text search, using Elasticsearch, an open-source, distributed search engine built on top of Apache Lucene. The session will showcase how to perform real-time searches on structured and non-structured data alike, how to cope with types and suggestions, do social graph filters and aggregations for efficient analytics. All from a Spring perspective Last but not least, the presentation focuses on the Hadoop platform and how Map/Reduce, Hive, Pig or Cascading jobs can leverage a search engine to significantly speed up execution and enhance their capabilities. The presentation covers architectural topics such as index scalability, data locality and partitioning, using off and on-premise storages (HDFS, S3, local file-systems) and multi-tenancy.

TRANSCRIPT

Page 1: Your Data, Your Search, Elasticsearch

© 2013 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Your data, your search, Elasticsearch

Costin Leau@costinl

Page 2: Your Data, Your Search, Elasticsearch

Agenda

Elasticsearch

Big Data

Analytics

Page 3: Your Data, Your Search, Elasticsearch

What is Elasticsearch?

Open-Source Search & Analytics engine - Structured & Unstructured Data

- Real Time

- Analytics capabilities (facets)

- REST based

Distributed- Designed for the Cloud

- Designed for Big Data

Page 4: Your Data, Your Search, Elasticsearch

What is Elasticsearch?

Open-Source Search & Analytics engine - Structured & Unstructured Data

- Real Time

- Analytics capabilities (facets)

- REST based

Distributed- Designed for the Cloud

- Designed for Big Data

Lightweight

Page 5: Your Data, Your Search, Elasticsearch

What is Elasticsearch?

Open-Source Search & Analytics engine - Structured & Unstructured Data

- Real Time

- Analytics capabilities (facets)

- REST based

Distributed- Designed for the Cloud

- Designed for Big Data

Lightweight

Popular: ~200K dl/month

Page 6: Your Data, Your Search, Elasticsearch

Users

Page 7: Your Data, Your Search, Elasticsearch

Users

Page 8: Your Data, Your Search, Elasticsearch

Platform adoption

http://www.thoughtworks.com/radar#platforms 2013

Page 9: Your Data, Your Search, Elasticsearch

Platform adoption

http://www.thoughtworks.com/radar#platforms 2013

Page 10: Your Data, Your Search, Elasticsearch

Use Case – Text search1.3 billion files, 130 billion lines of code

https://github.com/blog/1381-a-whole-new-code-search

Page 11: Your Data, Your Search, Elasticsearch

Use Case - Geolocation50 million venues / day

Page 12: Your Data, Your Search, Elasticsearch

Use Case - Recommandationsmillions of recommandations

Page 13: Your Data, Your Search, Elasticsearch

Use Case – Support/Reporting

Page 14: Your Data, Your Search, Elasticsearch

Use Case – Centralized Logging

Page 15: Your Data, Your Search, Elasticsearch

Use Case – Pure Analytics

Page 16: Your Data, Your Search, Elasticsearch

Plug & Play

Page 17: Your Data, Your Search, Elasticsearch

Instalation

$ wget https://download.elasticsearch.org/...

$ tar -xf elasticsearch-0.90.3.tar.gz

$ ./elasticsearch-0.90.3/bin/elasticsearch

... [INFO ][node][Ghost Maker] {0.90.2}[5645]: initializing ...

Page 18: Your Data, Your Search, Elasticsearch

Index a document

$ curl -X PUT localhost:9200/products/product/1 -d '{

"title" : "Welcome!"}'

Page 19: Your Data, Your Search, Elasticsearch

Update a document

$ curl -X PUT localhost:9200/products/product/1 -d '{

"title" : "Welcome to SpringOne2GX 2013!"}'

Page 20: Your Data, Your Search, Elasticsearch

Search for documents...

$ curl -X GET localhost:9200/products/_search?q=welcome

Page 21: Your Data, Your Search, Elasticsearch

Scaling out

$ ./elasticsearch-0.90.2/bin/elasticsearch -D es.node.name=Node2

...[cluster.service] [Node2] detected_master [Node1] ...

Page 22: Your Data, Your Search, Elasticsearch

Primaries and Replicas

curl -XPUT 'http://localhost:9200/a/' -d '{

"settings" : {"index" : {

"number_of_shards" : 3,"number_of_replicas" : 1

}}

}'

A1 Replicas

Primaries

A2

A3

A1

A2

A3

Page 23: Your Data, Your Search, Elasticsearch

Scaling out

$ ./elasticsearch-0.90.2/bin/elasticsearch -D es.node.name=Node3

...[cluster.service] [Node3] detected_master [Node1] ...

Page 24: Your Data, Your Search, Elasticsearch
Page 25: Your Data, Your Search, Elasticsearch
Page 26: Your Data, Your Search, Elasticsearch

JSON & HTTP

{"id" : "abc123“,"title" : "A JSON Document“,"body" : "A JSON document is a ...“,"published_on" : "2013/06/27 10:00:00“,"featured" : true, "tags" : ["search", "json"],"author" : {"first_name" : "Clara","last_name" : "Rice","email" : "[email protected]"

}}

Page 27: Your Data, Your Search, Elasticsearch

http:// Lingua Franca of APIs

Also supported: Native Java protocol, Thrift, Memcached

Page 28: Your Data, Your Search, Elasticsearch

Search & Find$ curl -X GET "http://localhost:9200/_search?q=<YOUR QUERY>"

Termsappleapple iphone

Phrases "apple iphone"

Proximity "apple safari"~5

Fuzzy apple~0.8

Wildcardsapp**pp*

Boosting apple^10 safari

Range[2011/05/01 TO 2011/05/31][java TO json]

Booleanapple AND NOT iphone+apple -iphone(apple OR iphone) AND NOT review

Fieldstitle:iphone^15 OR body:iphonepublished_on:[2011/05/01 TO "2011/05/27 10:00:00“]

Page 29: Your Data, Your Search, Elasticsearch

Query DSLcurl -X GET localhost:9200/articles/_search -d '{

"query" : {"filtered" : {"query" : {

"bool" : {

"must" : {"match" : {"author.first_name" : {

"query" : "claire","fuzziness" : 0.1

}}

},

"must" : {"multi_match" : {"query" : "elasticsearch","fields" : ["title^10", "body"]

}}

}

},

"filter": {"and" : [{ "terms" : { "tags" : ["search"] } },{ "range" : { "published_on": {"from": "2013"} } },{ "term" : { "featured" : true } }

]}

}}

}'

Page 30: Your Data, Your Search, Elasticsearch

Query DSLcurl -X GET localhost:9200/articles/_search -d '{

"query" : {"filtered" : {"query" : {

"bool" : {

"must" : {"match" : {"author.first_name" : {

"query" : "claire","fuzziness" : 0.1

}}

},

"must" : {"multi_match" : {"query" : "elasticsearch","fields" : ["title^10", "body"]

}}

}

},

"filter": {"and" : [{ "terms" : { "tags" : ["search"] } },{ "range" : { "published_on": {"from": "2013"} } },{ "term" : { "featured" : true } }

]}

}}

}'

Page 31: Your Data, Your Search, Elasticsearch

Query DSLcurl -X GET localhost:9200/articles/_search -d '{

"query" : {"filtered" : {"query" : {

"bool" : {

"must" : {"match" : {"author.first_name" : {

"query" : "claire","fuzziness" : 0.1

}}

},

"must" : {"multi_match" : {"query" : "elasticsearch","fields" : ["title^10", "body"]

}}

}

},

"filter": {"and" : [{ "terms" : { "tags" : ["search"] } },{ "range" : { "published_on": {"from": "2013"} } },{ "term" : { "featured" : true } }

]}

}}

}'

Page 32: Your Data, Your Search, Elasticsearch

Query DSLcurl -X GET localhost:9200/articles/_search -d '{

"query" : {"filtered" : {"query" : {

"bool" : {

"must" : {"match" : {"author.first_name" : {

"query" : "claire","fuzziness" : 0.1

}}

},

"must" : {"multi_match" : {"query" : "elasticsearch","fields" : ["title^10", "body"]

}}

}

},

"filter": {"and" : [{ "terms" : { "tags" : ["search"] } },{ "range" : { "published_on": {"from": "2013"} } },{ "term" : { "featured" : true } }

]}

}}

}'

Page 33: Your Data, Your Search, Elasticsearch

Search types

Full-text Search

Structured Search

Custom Scoring

“Find all articles from year 2013 tagged ‘search’”

“Find all articles with ‘search’ in their title or body, give matches in titles higher score”

See custom_score and custom_filters_score queries

Page 34: Your Data, Your Search, Elasticsearch

User Search Engine

Fetch document field ➝

Pick configured analyzer ➝

Parse text into tokens ➝

Apply token filters ➝

Store into index

Search perspectives

Page 35: Your Data, Your Search, Elasticsearch

Slice & Dice

Query

Facets

Page 36: Your Data, Your Search, Elasticsearch

OLAP Cube

Dimensions, measures, aggregations

Page 37: Your Data, Your Search, Elasticsearch

Slice Dice Drill Down / Roll Up

Show me sales numbers for all products across all locations in year 2013

Show me product A sales numbers across all locations over all years

Show me products sales numbers in location X over all years

Page 38: Your Data, Your Search, Elasticsearch

Clients

Page 39: Your Data, Your Search, Elasticsearch

Pick your language

Java

Perl*

Python*

Ruby*

Php*

Javascript

.Net

scala

clojure

go

Erlang

Eventmachine

Cli

Smalltalk

Ocaml

Page 40: Your Data, Your Search, Elasticsearch

Spring Data

Page 41: Your Data, Your Search, Elasticsearch
Page 42: Your Data, Your Search, Elasticsearch

Spring Data Elasticsearch

Easy to use Elasticsearch in a Spring-powered app

Configuring Elasticsearch client

Dedicated template for one-liners

Repository support

Page 43: Your Data, Your Search, Elasticsearch

Configuration

<beans xmlns:es=“http://www.sf.org/schema/data/elasticsearch”>

<es:repositories base-package=“com.acme” /><es:transport-client id="client"

cluster-nodes="localhost:9300,someip:9300" /></beans>

@Configuration@EnableElasticsearchRepositories(basePackages = “com/acme")static class Config {@Bean public ElasticsearchOperations elasticsearchTemplate() {

return new ElasticsearchTemplate(nodeBuilder().local(true).node().client());}

}

Page 44: Your Data, Your Search, Elasticsearch

Dedicated Template

Create/delete index/mappings

Query options

– Criteria

– String

– Search

Bulk operations

Scrolling/streaming

Page 45: Your Data, Your Search, Elasticsearch

Repositories

public interface BookRepository extends Repository<Book, String> {

List<Book> findByNameAndPrice(String name, Integer price);

List<Book> findByNameOrPrice(String name, Integer price);

Page<Book> findByName(String name,Pageable page);

Page<Book> findByNameNot(String name,Pageable page);

Page<Book> findByPriceBetween(int price,Pageable page);

Page<Book> findByNameLike(String name,Pageable page);

@Query("{‘bool’ : {‘must’ : {‘field’:{‘message’ : ‘?0’}}}}")Page<Book> findByMessage(String message, Pageable pageable);

}

Page 46: Your Data, Your Search, Elasticsearch

Sophisticated query creation

Keyword Example

And/Or findByNameAndPrice

Is findByName

Not findByNameNot

Less/GreaterThanEqual findByPriceLessThan

Before/After findByPriceAFter

Starting/EndingWith findByNameEndingWith

Contains/Containing findByNameContaining

OrderBy findByCountryOrderByName

True/False findByRetiredFalse

Near soon

Page 47: Your Data, Your Search, Elasticsearch

Big Data

Page 48: Your Data, Your Search, Elasticsearch

A Holistic View of a Big Data System

ETL

Real TimeStreams

Unstructured Data (HDFS)

RT Semi structuredDatabase(hBase, Cassandra,Mongo)

Big SQL(Greenplum,AsterData,Etc…)

BatchProcessingReal-Time

Processing(s4, storm)

Analytics

Page 49: Your Data, Your Search, Elasticsearch

ETL

Real TimeStreams

Unstructured Data (HDFS)

RT Semi structuredDatabase(hBase, Cassandra,Mongo)

Big SQL(Greenplum,AsterData,Etc…)

BatchProcessing

Analytics

Real-TimeProcessing(s4, storm)

A Holistic View of a Big Data System

Page 50: Your Data, Your Search, Elasticsearch

Hadoop eco-system

Hadoop Distributed File System (HDFS)

Map Reduce Framework (MapRed)

Page 51: Your Data, Your Search, Elasticsearch

Elasticsearch - Hadoop

Read/write data to Hadoop transparently

• Hadoop Input/OutputFormat

• Cascading Tap

• Pig Storage

• Hive SerDe

Native Map/Reduce model

Page 52: Your Data, Your Search, Elasticsearch

Elasticsearch + Hadoop

Writing

0

10

20

30

40

50

60

M/R Pig Hive

Raw

0

10

20

30

40

50

60

M/R Pig Hive

Raw

Reading / Querying

Page 53: Your Data, Your Search, Elasticsearch

Data Ingestion

DIY

Logstash

Flume

Graylog2

HDFS

Page 54: Your Data, Your Search, Elasticsearch

Logstash

Tool for managing events and logs

Collect, parse and store

Tons of

– inputs (~40)

– codecs (~11)

– filters(~40)

– outputs (~50)

Page 55: Your Data, Your Search, Elasticsearch

Kibana

Make senses of logging data

Runs inside your browser

Highly customizable

Leverages Elasticsearch aggregations/facets

Page 56: Your Data, Your Search, Elasticsearch

Thank you!@costinl