fatemeh lashkari unb university may 7 th 2014. 2 indexing semantic search semantic search...

19
Indexing and Retrieval Semantic Search Fatemeh Lashkari UNB University May 7 th 2014

Upload: brenda-casey

Post on 25-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

Indexing and Retrieval Semantic Search

Fatemeh LashkariUNB University

May 7th 2014

Page 2: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

2

Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance

Outline

Page 3: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

3

Indexing Inverted Index

Sort-based inversion Single-pass in memory inversion

HYB Index Prefix search Autocompletion search Expansion query and faceted search Fast error tolerant search Support ‘’select’’ and ‘’join’’ in database-style

Page 4: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance

Outline

4

Page 5: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

5http://broccoli.cs.uni-freiburg.de/demos/BroccoliFreebase/

Semantic Search Query: “astronauts walk on moon”

Page 6: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

6

Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance

Outline

Page 7: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

7

Semantic Search Architecture

Indexing Query Process

Answers of the question

OntologyText Collection

Page 8: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

8

Indexing Semantic Search Semantic Search Architecture Index process

Parsing Index Maintenance

Outline

Page 9: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

9

Parsing Preprocessing

Stemming Lower case General Motors general motors Remove some of stop words

• e.g is, do, a, of, ..

Annotation text Annotators Machine learning approaches

Page 10: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

10

Indexing Semantic Search Semantic Search Architecture Index process

Parsing Index Structure

Index Maintenance

Outline

Page 11: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

11

Index Structure The fast and efficient index does not

need the whole vocabulary of the indexed collection in main memory

need to sort postings need merge postings

• cache efficiently

Page 12: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

12

Indexing Semantic Search Semantic Search Architecture Index Process

Parsing Index Structure Building Index

Index Maintenance

Outline

Page 13: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

13

Building Index (Tasks to Decide)

How many index do we need?• Index for relation• Index for text

What is the structure of vocabulary?

What is the structure of posting?

What are statistic information that a posting contains? e.g <docId, position, score, entity>

apple: <6, 10, 0.3, class: fruit> <4, 2,0.9, class: company>

Page 14: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

14

Building Index (Tasks to Decide)

How to compute score to improve the final result?

How to save index?• Distribute index• Process query parallel

Which methods of compression can be used?

Page 15: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

15

Indexing Semantic Search Semantic Search Architecture Index process Index Maintenance

Outline

Page 16: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

16

Index Maintenance Strategies for maintaining index:

Merge-based (remerge) In-place Hybrid index update operation Geometric partitioning

Page 17: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

17

Thank You

Page 18: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

18

Reference1] Bast, Hannah, and Marjan Celikik. "Fast construction of the HYB index." ACM

Transactions on Information Systems (TOIS) 29.3 (2011): 16.

2] Bast, Holger, and Ingmar Weber. "Type less, find more: fast autocompletion search with a succinct index." Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2006

[3]Celikik, Marjan, and Hannah Bast. "Fast single-pass construction of a half-inverted index." String Processing and Information Retrieval. Springer Berlin Heidelberg, 2009.

[4] Heinz, S., Zobel, J.: Efficient single-pass index construction for text databases. Jour. of the American Society for Information Science and Technology (2003)

[5]Celikik, Marjan, and Holger Bast. "Fast error-tolerant search on very large texts." Proceedings of the 2009 ACM symposium on Applied Computing. ACM, 2009.

[6] Bast, Holger, Debapriyo Majumdar, and Ingmar Weber. "Efficient interactive query expansion with complete search." Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, 2007.

Page 19: Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance

19

Reference[7] Bast, Hannah, et al. "A case for semantic full-text search." Proceedings of the 1st Joint

International Workshop on Entity-Oriented and Semantic Search. ACM, 2012.

[8] Bast, Holger, et al. "ESTER: efficient search on text, entities, and relations." Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2007.

[9]Bast, Holger, Fabian Suchanek, and Ingmar Weber. "Semantic Full-Text Search with ESTER: Scalable, Easy, Fast." Data Mining Workshops, 2008. ICDMW'08. IEEE International Conference on. IEEE, 2008.

[10] Bast, Hannah, et al. "Broccoli: Semantic full-text search at your fingertips." arXiv preprint arXiv:1207.2615 (2012).

[11] Bast, Hannah, and Elmar Haussmann. "Open information extraction via contextual sentence decomposition." Semantic Computing (ICSC), 2013 IEEE Seventh International Conference on. IEEE, 2013.

[12] Cheng, Tao, and Kevin Chen-Chuan Chang. "Beyond pages: supporting efficient, scalable entity search with dual-inversion index." Proceedings of the 13th International Conference on Extending Database Technology. ACM, 2010.