lexical and statistical semantics in professional search · similarity based on uncertainty in word...
TRANSCRIPT
![Page 1: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/1.jpg)
Lexical and Statistical Semantics in
Professional Search
Allan Hanbury
![Page 2: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/2.jpg)
![Page 3: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/3.jpg)
![Page 4: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/4.jpg)
![Page 5: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/5.jpg)
1. A method for scrolling through portions of a data set, said method comprising: receiving a number of units associated with a rotational user input; determining an acceleration factor pertaining to the rotational user input; modifying the number of units by the acceleration factor; determining a next portion of the data set based on the modified number of units; and presenting the next portion of the data set.
2. A method as recited in claim 1, wherein the data set pertains to a list of items, and the portions of the data set include one or more of the items.
3. A method as recited in claim 1, wherein the data set pertains to a media file, and the portions of the data set pertain to one or more sections of the media file.
4. A method as recited in claim 3, wherein the media file is an audio file.
5. A method as recited in claim 1, wherein the rotational user input is provided via a rotational input device.
6. …
Claims
![Page 6: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/6.jpg)
![Page 7: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/7.jpg)
photo: wocintechchat.com
![Page 8: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/8.jpg)
![Page 9: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/9.jpg)
Search Systems
Context Technology
• People• Tasks• Documents• …
• Natural Language Processing
• Data-drivenapproaches
• …
![Page 10: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/10.jpg)
Natural Language Processing
This is particularly preferable in the case of liquid injection molding (LIM).
This/DT is/VBZ particularly/RB preferable/JJin/IN the/DT case/NN of/IN liquid/JJinjection/NN molding/NN (LIM/NN).
![Page 11: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/11.jpg)
Population Intervention Comparison
PIC Detection
![Page 12: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/12.jpg)
Population Intervention Comparison
PIC Detection
![Page 13: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/13.jpg)
![Page 14: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/14.jpg)
Less Effective More Effective
Effectiveness of Asthma Therapies
![Page 15: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/15.jpg)
2009
![Page 16: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/16.jpg)
Data-Driven
P (Word1 | Word2)
![Page 17: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/17.jpg)
![Page 18: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/18.jpg)
“Every time I fire a linguist, the performance of the speech recognizer goes up.”
Frederick Jelinek, IBM
Around 1985
![Page 19: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/19.jpg)
![Page 20: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/20.jpg)
Bag of Words
CC image courtesy of Flickr/surrealmuse
![Page 21: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/21.jpg)
“In most cases, the meaning of a word is its use.”
Ludwig Wittgenstein, Philosophical Investigations (1953)
![Page 22: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/22.jpg)
A bottle of tesgüino is on the table.
Everybody likes tesgüino.
Tesgüino makes you drunk.
We make tesgüino out of corn.
Example from J. R. Firth
![Page 23: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/23.jpg)
Statistical semantics extracts lexical information from the statistics of large amounts of text
![Page 24: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/24.jpg)
Word Embedding
http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
![Page 25: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/25.jpg)
Input Layer
Output Layer
Hidden Layer
~104−105
inputs
~102 nodes
#outputs= # inputs
![Page 26: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/26.jpg)
![Page 27: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/27.jpg)
What is more similar to a typewriter:(a) a bird, or (b) a snowball?
![Page 28: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/28.jpg)
Top k
books 0.82
foreword 0.77
author 0.74
published 0.73
preface 0.69
republished 0.68
reprinted 0.68
afterword 0.67
memoir 0.67
book
corpulent 0.44
hideous 0.43
unintelligent 0.42
wizened 0.42
catoblepas 0.42
creature 0.42
humanoid 0.41
grotesquely 0.41
tomtar 0.41
dwarfish
Similarity Threshold
Threshold
Navid Rekabsaz, Mihai Lupu, Allan Hanbury, Guido Zuccon, Exploration of a Threshold for
Similarity based on Uncertainty in Word Embedding. In Proceedings of the European
Conference on Information Retrieval Research (ECIR 2017)
Navid Rekabsaz, Mihai Lupu, Allan Hanbury, Uncertainty in Neural Network Word
Embedding: Exploration of Threshold for Similarity. Proc. Neu-IR Workshop of the ACM
Conference on Research and Development in Information Retrieval (NeuIR-SIGIR 2016)
Similarity Threshold
![Page 29: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/29.jpg)
Word Embedding in Search
knowledge
knowledge
knowledge
wisdom
understanding
knowledge
knowledge
0.7 × knowledge
0.8 × knowledge
![Page 30: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/30.jpg)
Navid Rekabsaz, Mihai Lupu, Allan Hanbury, Guido Zuccon, Generalizing Translation Models in the Probabilistic Relevance Framework,Proceedings of ACM International Conference on Information and Knowledge Management (CIKM 2016)
MA
P Im
pro
vem
ent
![Page 31: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/31.jpg)
CLEF-IP patent collectionResults for Patent Search
![Page 32: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/32.jpg)
Multi-Word Terms in Patents
coating method
memory data processorcomplex programmable logic devicerear cross frame memberdocument editing device
Document Length
Sentence Complexity
New Terminology
![Page 33: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/33.jpg)
Search Systems
Context Technology
Evaluation
![Page 34: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/34.jpg)
Acknowledgements
Linda Andersson, Alexandros Bampoulidis, Tobias Fink, Aldo Lipani, Mihai Lupu, João Palotti, Florina Piroi, Navid Rekabsaz, Abdel Aziz Taha, Markus Zlabinger
![Page 35: Lexical and Statistical Semantics in Professional Search · Similarity based on Uncertainty in Word Embedding. In Proceedings of the European Conference on Information Retrieval Research](https://reader033.vdocuments.us/reader033/viewer/2022050305/5f6d80d46aaa7708bb13c675/html5/thumbnails/35.jpg)
Univ.-Prof. Dr. Allan Hanbury
Institute for Information Systems Engineering
TU Wien
Favoritenstraße 9-11/194-04
1040 Vienna
Austria
Telephone: +43 1 58801 188310
Mobile: +43 676 978 0991
e-Mail: [email protected]