kyiv.py #16 october 2015
TRANSCRIPT
![Page 1: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/1.jpg)
Kyiv.py #16
Andrii Soldatenko 24 October 2015 @a_soldatenko
![Page 2: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/2.jpg)
ElasticSearch in Python
world.Andrii Soldatenko 24 October 2015 @a_soldatenko
![Page 3: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/3.jpg)
About me:• Software Engineer in Test at
• Speaker at PyCon Russian 2015
• Speaker at PyCon Ukraine 2014
• Speaker at PyCon Belarus 2015
• in past:
![Page 4: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/4.jpg)
Preface
![Page 5: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/5.jpg)
Information Explosion
![Page 6: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/6.jpg)
Text Searchgrep --ignore-case --recursive foo books/
grep --ignore-case --recursive --file=words.txt books/
Entry.objects.get(headline__icontains='foo')
words = []with open('words.txt', 'r') as f: words = f.readlines()
Entry.objects.get(headline__icontains_in=words)
![Page 7: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/7.jpg)
Full text search
![Page 8: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/8.jpg)
Search index
![Page 9: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/9.jpg)
Simple sentences
1. The quick brown fox jumped over the lazy dog
2. Quick brown foxes leap over lazy dogs in summer
![Page 10: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/10.jpg)
Inverted indexTerm Doc_1 Doc_2-------------------------Quick | | XThe | X |brown | X | Xdog | X |dogs | | Xfox | X |foxes | | Xin | | Xjumped | X |lazy | X | Xleap | | Xover | X | Xquick | X |summer | | Xthe | X |------------------------
![Page 11: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/11.jpg)
Inverted index
Term Doc_1 Doc_2-------------------------brown | X | Xquick | X |------------------------Total | 2 | 1
![Page 12: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/12.jpg)
Inverted index: normalization
Term Doc_1 Doc_2-------------------------brown | X | Xdog | X | Xfox | X | Xin | | Xjump | X | Xlazy | X | Xover | X | Xquick | X | Xsummer | | Xthe | X | X------------------------
Term Doc_1 Doc_2-------------------------Quick | | XThe | X |brown | X | Xdog | X |dogs | | Xfox | X |foxes | | Xin | | Xjumped | X |lazy | X | Xleap | | Xover | X | Xquick | X |summer | | Xthe | X |------------------------
![Page 13: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/13.jpg)
Search Engines
![Page 14: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/14.jpg)
ElasticSearch
![Page 15: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/15.jpg)
Who uses ElasticSearch?
![Page 16: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/16.jpg)
ElasticSearch: Quick Intro
Relational DB Databases TablesRows Columns
ElasticSearch Indices FieldsTypes Documents
![Page 17: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/17.jpg)
ElasticSearch: Quick Intro
PUT /haystack/user/1{ "first_name" : "Andrii", "last_name" : "Soldatenko", "age" : 30, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ], "likes": [ "python", "django" ]}
![Page 18: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/18.jpg)
ElasticSearch: Locks
•Pessimistic concurrency control
•Optimistic concurrency control
![Page 19: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/19.jpg)
ElasticSearch: Setup
#!/bin/bash
VERSION=1.7.1
curl -L -O https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-$VERSION.zipunzip elasticsearch-$VERSION.zipcd elasticsearch-$VERSION
# Download plugin marvel./bin/plugin -i elasticsearch/marvel/latest
echo 'marvel.agent.enabled: false' >> ./config/elasticsearch.yml
# run elastic./bin/elasticsearch -d
![Page 20: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/20.jpg)
ElasticSearch: Setup
$ curl ‘http://localhost:9200/?pretty'
{ "status" : 200, "name" : "Dredmund Druid", "cluster_name" : "elasticsearch", "version" : { "number" : "1.7.1", "build_hash" : "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19", "build_timestamp" : "2015-07-29T09:54:16Z", "build_snapshot" : false, "lucene_version" : "4.10.4" }, "tagline" : "You Know, for Search"}
![Page 21: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/21.jpg)
ElasticSearch: Settings
curl -X POST 'http://localhost:9200/<index_name>/_close'
curl -XPUT "http://localhost:9200/<index_name>/_settings" -d'{ "settings": { "analysis": { "analyzer": { "my_analyzer": { "type": "standard", "stopwords": [ "and", "the" ] } } } }}'
curl -X POST 'http://localhost:9200/<index_name>/_open'
![Page 22: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/22.jpg)
Haystack
![Page 23: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/23.jpg)
Adding search functionality to Simple Model
$ cat myapp/models.py
from django.db import modelsfrom django.contrib.auth.models import User
class Page(models.Model): user = models.ForeignKey(User) name = models.CharField(max_length=200) description = models.TextField()
def __unicode__(self): return self.name
![Page 24: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/24.jpg)
Haystack: Installation$ pip install django-haystack
$ cat settings.py
INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.sites',
# Added. 'haystack',
# Then your usual apps... 'blog',]
![Page 25: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/25.jpg)
Haystack: Settings
$ pip install elasticsearch
$ cat settings.py...HAYSTACK_CONNECTIONS = { 'default': { 'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine', 'URL': 'http://127.0.0.1:9200/', 'INDEX_NAME': 'haystack', },}...
![Page 26: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/26.jpg)
Haystack: Creating SearchIndexes
$ cat myapp/search_indexes.py
import datetimefrom haystack import indexesfrom myapp.models import Note
class PageIndex(indexes.SearchIndex, indexes.Indexable): text = indexes.CharField(document=True, use_template=True) author = indexes.CharField(model_attr='user') pub_date = indexes.DateTimeField(model_attr='pub_date')
def get_model(self): return Note
def index_queryset(self, using=None): """Used when the entire index for model is updated.""" return self.get_model().objects. \ filter(pub_date__lte=datetime.datetime.now())
![Page 27: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/27.jpg)
Haystack: SearchQuerySet API
from haystack.query import SearchQuerySetfrom haystack.inputs import Raw
all_results = SearchQuerySet().all()
hello_results = SearchQuerySet().filter(content='hello')
unfriendly_results = SearchQuerySet().\ exclude(content=‘hello’).\ filter(content=‘world’)
# To send unescaped data:sqs = SearchQuerySet().filter(title=Raw(trusted_query))
![Page 28: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/28.jpg)
How to configure elasticSearch?
https://github.com/django-haystack/django-haystack/blob/9d92d4da0a1ec75978fc3949375dda9a1707469f/haystack/
backends/elasticsearch_backend.py#L41
![Page 29: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/29.jpg)
ElasticSearch settings
![Page 30: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/30.jpg)
ElasticStack backend
https://github.com/bennylope/elasticstack
HAYSTACK_CONNECTIONS = { 'default': { 'ENGINE': 'elasticstack.backends.ConfigurableElasticSearchEngine', 'URL': 'http://127.0.0.1:9200/', 'INDEX_NAME': 'haystack', },}
ELASTICSEARCH_INDEX_SETTINGS = {}
ELASTICSEARCH_DEFAULT_ANALYZER = 'synonym_analyzer'
![Page 31: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/31.jpg)
Keeping data in sync# Update everything../manage.py update_index --settings=settings.prod
# Update everything with lots of information about what's going on../manage.py update_index --settings=settings.prod --verbosity=2
# Update everything, cleaning up after deleted models../manage.py update_index --remove --settings=settings.prod
# Update everything changed in the last 2 hours../manage.py update_index --age=2 --settings=settings.prod
# Update everything between Dec. 1, 2011 & Dec 31, 2011./manage.py update_index --start='2011-12-01T00:00:00' --end='2011-12-31T23:59:59' --settings=settings.prod
![Page 32: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/32.jpg)
Signalsclass RealtimeSignalProcessor(BaseSignalProcessor): """ Allows for observing when saves/deletes fire & automatically updates the search engine appropriately. """ def setup(self): # Naive (listen to all model saves). models.signals.post_save.connect(self.handle_save) models.signals.post_delete.connect(self.handle_delete) # Efficient would be going through all backends & collecting all models # being used, then hooking up signals only for those.
def teardown(self): # Naive (listen to all model saves). models.signals.post_save.disconnect(self.handle_save) models.signals.post_delete.disconnect(self.handle_delete) # Efficient would be going through all backends & collecting all models # being used, then disconnecting signals only for those.
![Page 33: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/33.jpg)
Haystack: Pros and Cons
Pros:
• easy to setup • looks like Django ORM but for searches • search engine independent • support 4 engines (Elastic, Solr, Xapian, Whoosh)
Cons:
• poor SearchQuerySet API • difficult to manage stop words • loose performance, because extra layer • Model - based
![Page 34: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/34.jpg)
Final Thoughts
https://www.elastic.co/guide/en/elasticsearch/guide/master/index.html
![Page 36: Kyiv.py #16 october 2015](https://reader031.vdocuments.us/reader031/viewer/2022030317/586f7c651a28ab10258b7b53/html5/thumbnails/36.jpg)
Questions
?