fatemeh lashkari unb university may 7 th 2014. 2 indexing semantic search semantic search...
TRANSCRIPT
Indexing and Retrieval Semantic Search
Fatemeh LashkariUNB University
May 7th 2014
2
Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance
Outline
3
Indexing Inverted Index
Sort-based inversion Single-pass in memory inversion
HYB Index Prefix search Autocompletion search Expansion query and faceted search Fast error tolerant search Support ‘’select’’ and ‘’join’’ in database-style
Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance
Outline
4
5http://broccoli.cs.uni-freiburg.de/demos/BroccoliFreebase/
Semantic Search Query: “astronauts walk on moon”
6
Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance
Outline
7
Semantic Search Architecture
Indexing Query Process
Answers of the question
OntologyText Collection
8
Indexing Semantic Search Semantic Search Architecture Index process
Parsing Index Maintenance
Outline
9
Parsing Preprocessing
Stemming Lower case General Motors general motors Remove some of stop words
• e.g is, do, a, of, ..
Annotation text Annotators Machine learning approaches
10
Indexing Semantic Search Semantic Search Architecture Index process
Parsing Index Structure
Index Maintenance
Outline
11
Index Structure The fast and efficient index does not
need the whole vocabulary of the indexed collection in main memory
need to sort postings need merge postings
• cache efficiently
12
Indexing Semantic Search Semantic Search Architecture Index Process
Parsing Index Structure Building Index
Index Maintenance
Outline
13
Building Index (Tasks to Decide)
How many index do we need?• Index for relation• Index for text
What is the structure of vocabulary?
What is the structure of posting?
What are statistic information that a posting contains? e.g <docId, position, score, entity>
apple: <6, 10, 0.3, class: fruit> <4, 2,0.9, class: company>
14
Building Index (Tasks to Decide)
How to compute score to improve the final result?
How to save index?• Distribute index• Process query parallel
Which methods of compression can be used?
15
Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance
Outline
16
Index Maintenance Strategies for maintaining index:
Merge-based (remerge) In-place Hybrid index update operation Geometric partitioning
17
Thank You
18
Reference1] Bast, Hannah, and Marjan Celikik. "Fast construction of the HYB index." ACM
Transactions on Information Systems (TOIS) 29.3 (2011): 16.
2] Bast, Holger, and Ingmar Weber. "Type less, find more: fast autocompletion search with a succinct index." Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2006
[3]Celikik, Marjan, and Hannah Bast. "Fast single-pass construction of a half-inverted index." String Processing and Information Retrieval. Springer Berlin Heidelberg, 2009.
[4] Heinz, S., Zobel, J.: Efficient single-pass index construction for text databases. Jour. of the American Society for Information Science and Technology (2003)
[5]Celikik, Marjan, and Holger Bast. "Fast error-tolerant search on very large texts." Proceedings of the 2009 ACM symposium on Applied Computing. ACM, 2009.
[6] Bast, Holger, Debapriyo Majumdar, and Ingmar Weber. "Efficient interactive query expansion with complete search." Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, 2007.
19
Reference[7] Bast, Hannah, et al. "A case for semantic full-text search." Proceedings of the 1st Joint
International Workshop on Entity-Oriented and Semantic Search. ACM, 2012.
[8] Bast, Holger, et al. "ESTER: efficient search on text, entities, and relations." Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2007.
[9]Bast, Holger, Fabian Suchanek, and Ingmar Weber. "Semantic Full-Text Search with ESTER: Scalable, Easy, Fast." Data Mining Workshops, 2008. ICDMW'08. IEEE International Conference on. IEEE, 2008.
[10] Bast, Hannah, et al. "Broccoli: Semantic full-text search at your fingertips." arXiv preprint arXiv:1207.2615 (2012).
[11] Bast, Hannah, and Elmar Haussmann. "Open information extraction via contextual sentence decomposition." Semantic Computing (ICSC), 2013 IEEE Seventh International Conference on. IEEE, 2013.
[12] Cheng, Tao, and Kevin Chen-Chuan Chang. "Beyond pages: supporting efficient, scalable entity search with dual-inversion index." Proceedings of the 13th International Conference on Extending Database Technology. ACM, 2010.