berlin buzzwords 2013 - faceting analyzed fields with some sprinkles of probability theory

Post on 16-Jul-2015

394 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Faceting analyzed fields with some sprinkles of

probability theoryconjures trending topic analysis and other

interesting insights

Boaz LeskesElasticsearch

@bleskes

work done forBuzzcapture

Trending?

© Buzzcapture

© Buzzcapture

reference referencetopic

© Buzzcapture

topic reference≠

topic reference

P (w|T ) = kDt|w 2 DtkkDtk

topic reference

P (w|T ) = kDt|w 2 DtkkDtk

P (w|T ) = kDt|w 2 DtkkDtk

brown

dog

fox

quick

2 5 10 12

5 6 12 13

2

5

6

10

12

13

brown

dog

fox

quick

In our index.

• Terms = 12GB

• “Arrows” = 41GB

{    tweet: {        type:      "string",        analyzer:  "whitespace"        fielddata: {            filter: {                regex:         "^#.*",                frequency: {                    min:       10                }            }        }    }}

Drop terms which occur too little

Drop docs with too many terms

reference referencetopic

© Buzzcapture

iculture 10,122floor 8,998cover 6,874toy 4,402ground 3,841

4.0 7,8784.1 4,292rtacties 4,078jelly 2,905bean 2,857

top related