your data, your search, elasticsearch

Post on 06-May-2015

1.999 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Speaker: Costin Leau Finding relevant information fast has always been a challenge, even more so in today's growing "oceans" of data. This talk explores the area of real-time full text search, using Elasticsearch, an open-source, distributed search engine built on top of Apache Lucene. The session will showcase how to perform real-time searches on structured and non-structured data alike, how to cope with types and suggestions, do social graph filters and aggregations for efficient analytics. All from a Spring perspective Last but not least, the presentation focuses on the Hadoop platform and how Map/Reduce, Hive, Pig or Cascading jobs can leverage a search engine to significantly speed up execution and enhance their capabilities. The presentation covers architectural topics such as index scalability, data locality and partitioning, using off and on-premise storages (HDFS, S3, local file-systems) and multi-tenancy.

TRANSCRIPT

© 2013 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Your data, your search, Elasticsearch

Costin Leau@costinl

Agenda

Elasticsearch

Big Data

Analytics

What is Elasticsearch?

Open-Source Search & Analytics engine - Structured & Unstructured Data

- Real Time

- Analytics capabilities (facets)

- REST based

Distributed- Designed for the Cloud

- Designed for Big Data

What is Elasticsearch?

Open-Source Search & Analytics engine - Structured & Unstructured Data

- Real Time

- Analytics capabilities (facets)

- REST based

Distributed- Designed for the Cloud

- Designed for Big Data

Lightweight

What is Elasticsearch?

Open-Source Search & Analytics engine - Structured & Unstructured Data

- Real Time

- Analytics capabilities (facets)

- REST based

Distributed- Designed for the Cloud

- Designed for Big Data

Lightweight

Popular: ~200K dl/month

Users

Users

Platform adoption

http://www.thoughtworks.com/radar#platforms 2013

Platform adoption

http://www.thoughtworks.com/radar#platforms 2013

Use Case – Text search1.3 billion files, 130 billion lines of code

https://github.com/blog/1381-a-whole-new-code-search

Use Case - Geolocation50 million venues / day

Use Case - Recommandationsmillions of recommandations

Use Case – Support/Reporting

Use Case – Centralized Logging

Use Case – Pure Analytics

Plug & Play

Instalation

$ wget https://download.elasticsearch.org/...

$ tar -xf elasticsearch-0.90.3.tar.gz

$ ./elasticsearch-0.90.3/bin/elasticsearch

... [INFO ][node][Ghost Maker] {0.90.2}[5645]: initializing ...

Index a document

$ curl -X PUT localhost:9200/products/product/1 -d '{

"title" : "Welcome!"}'

Update a document

$ curl -X PUT localhost:9200/products/product/1 -d '{

"title" : "Welcome to SpringOne2GX 2013!"}'

Search for documents...

$ curl -X GET localhost:9200/products/_search?q=welcome

Scaling out

$ ./elasticsearch-0.90.2/bin/elasticsearch -D es.node.name=Node2

...[cluster.service] [Node2] detected_master [Node1] ...

Primaries and Replicas

curl -XPUT 'http://localhost:9200/a/' -d '{

"settings" : {"index" : {

"number_of_shards" : 3,"number_of_replicas" : 1

}}

}'

A1 Replicas

Primaries

A2

A3

A1

A2

A3

Scaling out

$ ./elasticsearch-0.90.2/bin/elasticsearch -D es.node.name=Node3

...[cluster.service] [Node3] detected_master [Node1] ...

JSON & HTTP

{"id" : "abc123“,"title" : "A JSON Document“,"body" : "A JSON document is a ...“,"published_on" : "2013/06/27 10:00:00“,"featured" : true, "tags" : ["search", "json"],"author" : {"first_name" : "Clara","last_name" : "Rice","email" : "clara@rice.org"

}}

http:// Lingua Franca of APIs

Also supported: Native Java protocol, Thrift, Memcached

Search & Find$ curl -X GET "http://localhost:9200/_search?q=<YOUR QUERY>"

Termsappleapple iphone

Phrases "apple iphone"

Proximity "apple safari"~5

Fuzzy apple~0.8

Wildcardsapp**pp*

Boosting apple^10 safari

Range[2011/05/01 TO 2011/05/31][java TO json]

Booleanapple AND NOT iphone+apple -iphone(apple OR iphone) AND NOT review

Fieldstitle:iphone^15 OR body:iphonepublished_on:[2011/05/01 TO "2011/05/27 10:00:00“]

Query DSLcurl -X GET localhost:9200/articles/_search -d '{

"query" : {"filtered" : {"query" : {

"bool" : {

"must" : {"match" : {"author.first_name" : {

"query" : "claire","fuzziness" : 0.1

}}

},

"must" : {"multi_match" : {"query" : "elasticsearch","fields" : ["title^10", "body"]

}}

}

},

"filter": {"and" : [{ "terms" : { "tags" : ["search"] } },{ "range" : { "published_on": {"from": "2013"} } },{ "term" : { "featured" : true } }

]}

}}

}'

Query DSLcurl -X GET localhost:9200/articles/_search -d '{

"query" : {"filtered" : {"query" : {

"bool" : {

"must" : {"match" : {"author.first_name" : {

"query" : "claire","fuzziness" : 0.1

}}

},

"must" : {"multi_match" : {"query" : "elasticsearch","fields" : ["title^10", "body"]

}}

}

},

"filter": {"and" : [{ "terms" : { "tags" : ["search"] } },{ "range" : { "published_on": {"from": "2013"} } },{ "term" : { "featured" : true } }

]}

}}

}'

Query DSLcurl -X GET localhost:9200/articles/_search -d '{

"query" : {"filtered" : {"query" : {

"bool" : {

"must" : {"match" : {"author.first_name" : {

"query" : "claire","fuzziness" : 0.1

}}

},

"must" : {"multi_match" : {"query" : "elasticsearch","fields" : ["title^10", "body"]

}}

}

},

"filter": {"and" : [{ "terms" : { "tags" : ["search"] } },{ "range" : { "published_on": {"from": "2013"} } },{ "term" : { "featured" : true } }

]}

}}

}'

Query DSLcurl -X GET localhost:9200/articles/_search -d '{

"query" : {"filtered" : {"query" : {

"bool" : {

"must" : {"match" : {"author.first_name" : {

"query" : "claire","fuzziness" : 0.1

}}

},

"must" : {"multi_match" : {"query" : "elasticsearch","fields" : ["title^10", "body"]

}}

}

},

"filter": {"and" : [{ "terms" : { "tags" : ["search"] } },{ "range" : { "published_on": {"from": "2013"} } },{ "term" : { "featured" : true } }

]}

}}

}'

Search types

Full-text Search

Structured Search

Custom Scoring

“Find all articles from year 2013 tagged ‘search’”

“Find all articles with ‘search’ in their title or body, give matches in titles higher score”

See custom_score and custom_filters_score queries

User Search Engine

Fetch document field ➝

Pick configured analyzer ➝

Parse text into tokens ➝

Apply token filters ➝

Store into index

Search perspectives

Slice & Dice

Query

Facets

OLAP Cube

Dimensions, measures, aggregations

Slice Dice Drill Down / Roll Up

Show me sales numbers for all products across all locations in year 2013

Show me product A sales numbers across all locations over all years

Show me products sales numbers in location X over all years

Clients

Pick your language

Java

Perl*

Python*

Ruby*

Php*

Javascript

.Net

scala

clojure

go

Erlang

Eventmachine

Cli

Smalltalk

Ocaml

Spring Data

Spring Data Elasticsearch

Easy to use Elasticsearch in a Spring-powered app

Configuring Elasticsearch client

Dedicated template for one-liners

Repository support

Configuration

<beans xmlns:es=“http://www.sf.org/schema/data/elasticsearch”>

<es:repositories base-package=“com.acme” /><es:transport-client id="client"

cluster-nodes="localhost:9300,someip:9300" /></beans>

@Configuration@EnableElasticsearchRepositories(basePackages = “com/acme")static class Config {@Bean public ElasticsearchOperations elasticsearchTemplate() {

return new ElasticsearchTemplate(nodeBuilder().local(true).node().client());}

}

Dedicated Template

Create/delete index/mappings

Query options

– Criteria

– String

– Search

Bulk operations

Scrolling/streaming

Repositories

public interface BookRepository extends Repository<Book, String> {

List<Book> findByNameAndPrice(String name, Integer price);

List<Book> findByNameOrPrice(String name, Integer price);

Page<Book> findByName(String name,Pageable page);

Page<Book> findByNameNot(String name,Pageable page);

Page<Book> findByPriceBetween(int price,Pageable page);

Page<Book> findByNameLike(String name,Pageable page);

@Query("{‘bool’ : {‘must’ : {‘field’:{‘message’ : ‘?0’}}}}")Page<Book> findByMessage(String message, Pageable pageable);

}

Sophisticated query creation

Keyword Example

And/Or findByNameAndPrice

Is findByName

Not findByNameNot

Less/GreaterThanEqual findByPriceLessThan

Before/After findByPriceAFter

Starting/EndingWith findByNameEndingWith

Contains/Containing findByNameContaining

OrderBy findByCountryOrderByName

True/False findByRetiredFalse

Near soon

Big Data

A Holistic View of a Big Data System

ETL

Real TimeStreams

Unstructured Data (HDFS)

RT Semi structuredDatabase(hBase, Cassandra,Mongo)

Big SQL(Greenplum,AsterData,Etc…)

BatchProcessingReal-Time

Processing(s4, storm)

Analytics

ETL

Real TimeStreams

Unstructured Data (HDFS)

RT Semi structuredDatabase(hBase, Cassandra,Mongo)

Big SQL(Greenplum,AsterData,Etc…)

BatchProcessing

Analytics

Real-TimeProcessing(s4, storm)

A Holistic View of a Big Data System

Hadoop eco-system

Hadoop Distributed File System (HDFS)

Map Reduce Framework (MapRed)

Elasticsearch - Hadoop

Read/write data to Hadoop transparently

• Hadoop Input/OutputFormat

• Cascading Tap

• Pig Storage

• Hive SerDe

Native Map/Reduce model

Elasticsearch + Hadoop

Writing

0

10

20

30

40

50

60

M/R Pig Hive

Raw

0

10

20

30

40

50

60

M/R Pig Hive

Raw

Reading / Querying

Data Ingestion

DIY

Logstash

Flume

Graylog2

HDFS

Logstash

Tool for managing events and logs

Collect, parse and store

Tons of

– inputs (~40)

– codecs (~11)

– filters(~40)

– outputs (~50)

Kibana

Make senses of logging data

Runs inside your browser

Highly customizable

Leverages Elasticsearch aggregations/facets

Thank you!@costinl

top related