large scale nlp using python's nltk on azure
TRANSCRIPT
![Page 1: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/1.jpg)
Beat Schweglerhead in the cloud feet on the ground
Twitter: @cloudbeatsch Blog: http://cloudbeatsch.com
I saw Mr. Washington with a saw!large scale NLP using python's NLTK on Azure
![Page 2: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/2.jpg)
I saw Mr. Washington.This is your saw… I told you!Is this really a chainsaw?
![Page 3: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/3.jpg)
fundamentals of nlp
natural language toolkit (nltk)
running python and nltk on Azure
![Page 4: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/4.jpg)
source: http://www.nltk.org/book_1ed/ch01.html
simple pipeline architecture for a spoken dialogue system
![Page 5: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/5.jpg)
dialogue with a chatbot
![Page 6: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/6.jpg)
![Page 7: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/7.jpg)
identify languagetokenize & tag part of speech (pos)identify named entities
![Page 8: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/8.jpg)
corpora and lexical resourcescorpus is a large body of textlexical resource is a collection words associated with additional information
![Page 9: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/9.jpg)
e.g. brown corpusfirst million-word electronic corpus of english, created in 1961 at brown university
![Page 10: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/10.jpg)
segmentationtokenizetag part of speech (pos)identify named entities
source: http://www.nltk.org/book_1ed/ch07.html
![Page 11: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/11.jpg)
entity detection using chunking
![Page 12: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/12.jpg)
fundamentals of nlp
natural language toolkit (nltk)
running python and nltk on Azure
![Page 13: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/13.jpg)
text as a sequence of words and punctuation represented as a list
sent = [‘I', ‘love', ‘Dublin', ‘!']upper_sent = [w.upper() for w in
sent]
![Page 14: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/14.jpg)
downloading corpus and lexical resourcesnltk.download(‘all’)nltk.download(‘brown’)
![Page 15: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/15.jpg)
segment text into sentencesfrom nltk.tokenize import sent_tokenizesent_tokenize_list = sent_tokenize(text)
![Page 16: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/16.jpg)
tokenize sentencefrom nltk.tokenize import word_tokenizetokens = word_tokenize(sentence)
![Page 17: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/17.jpg)
tag part of speech (pos)tags = nltk.pos_tag(tokens)
![Page 18: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/18.jpg)
![Page 19: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/19.jpg)
identify named entitiesentities = nltk.ne_chunk(tags)entities.draw()
![Page 20: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/20.jpg)
![Page 21: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/21.jpg)
demo
![Page 22: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/22.jpg)
language recognition import langidlang = langid.classify(text)[0]
![Page 23: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/23.jpg)
fundamentals of nlp
natural language toolkit (nltk)
running python and nltk on Azure
![Page 24: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/24.jpg)
azure cloud services azure webjobsazure functions
![Page 25: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/25.jpg)
azure cloud services & pythonpip’s requirements.txtPowerShell scripts for setup and launch
![Page 26: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/26.jpg)
azure webjobs & pythonupload zip (inc. dependencies)runs run.py (or the first py file it finds)
![Page 27: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/27.jpg)
configuration settings key = os.environ["STORAGE_KEY"]
![Page 28: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/28.jpg)
publish webjobpip packages into site-packages zip application (inc. depended packages)upload zip file
![Page 29: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/29.jpg)
add package location to sys.pathp = os.path.join(os.getcwd(), "site-packages")sys.path.append(p)
![Page 30: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/30.jpg)
downloading corpusD:\\local\\AppData\\nltk_dataif os.getenv("DOWNLOAD", True) == True : dest = os.environ[“NLTK_DATA_DIR"] nltk.download('all', dest)
![Page 31: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/31.jpg)
using queues for communicationreads text from input queue writes processed text into output queues
![Page 32: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/32.jpg)
auto scalebased on queue length
![Page 33: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/33.jpg)
debugging python webjobslocal: vs and webjob simulatorcloud: use kudu (xyz.scm.azurewebsites.net) and logs
![Page 34: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/34.jpg)
![Page 35: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/35.jpg)
![Page 36: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/36.jpg)
demo
![Page 37: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/37.jpg)
in closing…
![Page 38: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/38.jpg)
nltk is a great toolkit to perform nlp tasksazure provides an elastic and scalable platform to run python nltk jobs
![Page 39: Large scale nlp using python's nltk on azure](https://reader036.vdocuments.us/reader036/viewer/2022062306/58ecd8f31a28abc7388b45b9/html5/thumbnails/39.jpg)
http://www.nltk.org/ http://www.nltk.org/book_1ed
http://azure.com/