introduction to information retrievalleeck/ir/postinglist.pdf · 2017-03-02 · introduction to...

18
Introduction to Information Retrieval PostingList Park Cheon Eum

Upload: others

Post on 20-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Information Retrievalleeck/IR/PostingList.pdf · 2017-03-02 · Introduction to Information Retrieval Algorithm - Indexer steps: Token sequence 문서 내용을 토큰

Introduction to Information Retrieval

Introduction to

Information Retrieval

PostingList

Park Cheon Eum

Page 2: Introduction to Information Retrievalleeck/IR/PostingList.pdf · 2017-03-02 · Introduction to Information Retrieval Algorithm - Indexer steps: Token sequence 문서 내용을 토큰

Introduction to Information Retrieval

awk - array

Ch. 1

Page 3: Introduction to Information Retrievalleeck/IR/PostingList.pdf · 2017-03-02 · Introduction to Information Retrieval Algorithm - Indexer steps: Token sequence 문서 내용을 토큰

Introduction to Information Retrieval

awk - array

Page 4: Introduction to Information Retrievalleeck/IR/PostingList.pdf · 2017-03-02 · Introduction to Information Retrieval Algorithm - Indexer steps: Token sequence 문서 내용을 토큰

Introduction to Information Retrieval

awk - array

Page 5: Introduction to Information Retrievalleeck/IR/PostingList.pdf · 2017-03-02 · Introduction to Information Retrieval Algorithm - Indexer steps: Token sequence 문서 내용을 토큰

Introduction to Information Retrieval

awk - array

Page 6: Introduction to Information Retrievalleeck/IR/PostingList.pdf · 2017-03-02 · Introduction to Information Retrieval Algorithm - Indexer steps: Token sequence 문서 내용을 토큰

Introduction to Information Retrieval

awk - array

Page 7: Introduction to Information Retrievalleeck/IR/PostingList.pdf · 2017-03-02 · Introduction to Information Retrieval Algorithm - Indexer steps: Token sequence 문서 내용을 토큰

Introduction to Information Retrieval

Algorithm

start

doc1, … , 10

split(doc1,…,10)

doc1,…,10 < id

append(docs, doc1,…,10)

sort, uniq

posting

postring결과

End

Page 8: Introduction to Information Retrievalleeck/IR/PostingList.pdf · 2017-03-02 · Introduction to Information Retrieval Algorithm - Indexer steps: Token sequence 문서 내용을 토큰

Introduction to Information Retrieval

Algorithm - Indexer steps: Token sequence

문서 내용을 토큰 별로 나누어 ID를 설정한다.

I did enact Julius

Caesar I was killed

i' the Capitol;

Brutus killed me.

Doc 1

So let it be with

Caesar. The noble

Brutus hath told you

Caesar was ambitious

Doc 2

Page 9: Introduction to Information Retrievalleeck/IR/PostingList.pdf · 2017-03-02 · Introduction to Information Retrieval Algorithm - Indexer steps: Token sequence 문서 내용을 토큰

Introduction to Information Retrieval

Algorithm - Indexer steps: Sort

단어 별로 정렬한다. ID 순으로

Page 10: Introduction to Information Retrievalleeck/IR/PostingList.pdf · 2017-03-02 · Introduction to Information Retrieval Algorithm - Indexer steps: Token sequence 문서 내용을 토큰

Introduction to Information Retrieval

Algorithm - Indexer steps: Dictionary & Postings

같은 단어 && 같은 ID 는 하나만 남긴다. (= frequency)

같은 단어 && 다른 ID는 Posting한다.

Sec. 1.2

Page 11: Introduction to Information Retrievalleeck/IR/PostingList.pdf · 2017-03-02 · Introduction to Information Retrieval Algorithm - Indexer steps: Token sequence 문서 내용을 토큰

Introduction to Information Retrieval

Algorithm

Page 12: Introduction to Information Retrievalleeck/IR/PostingList.pdf · 2017-03-02 · Introduction to Information Retrieval Algorithm - Indexer steps: Token sequence 문서 내용을 토큰

Introduction to Information Retrieval

Processing

doc1, … , 10

split(doc1,…,10)

doc1,…,10 < id

Page 13: Introduction to Information Retrievalleeck/IR/PostingList.pdf · 2017-03-02 · Introduction to Information Retrieval Algorithm - Indexer steps: Token sequence 문서 내용을 토큰

Introduction to Information Retrieval

Processing

append(docs, doc1,…,10)

Page 14: Introduction to Information Retrievalleeck/IR/PostingList.pdf · 2017-03-02 · Introduction to Information Retrieval Algorithm - Indexer steps: Token sequence 문서 내용을 토큰

Introduction to Information Retrieval

Processing

sort

Page 15: Introduction to Information Retrievalleeck/IR/PostingList.pdf · 2017-03-02 · Introduction to Information Retrieval Algorithm - Indexer steps: Token sequence 문서 내용을 토큰

Introduction to Information Retrieval

Processing

15

frequency

Page 16: Introduction to Information Retrievalleeck/IR/PostingList.pdf · 2017-03-02 · Introduction to Information Retrieval Algorithm - Indexer steps: Token sequence 문서 내용을 토큰

Introduction to Information Retrieval

Processing

쉬운 방법

posting

Page 17: Introduction to Information Retrievalleeck/IR/PostingList.pdf · 2017-03-02 · Introduction to Information Retrieval Algorithm - Indexer steps: Token sequence 문서 내용을 토큰

Introduction to Information Retrieval

Processing

배열 사용 posting

Page 18: Introduction to Information Retrievalleeck/IR/PostingList.pdf · 2017-03-02 · Introduction to Information Retrieval Algorithm - Indexer steps: Token sequence 문서 내용을 토큰

Introduction to Information Retrieval

Processing

배열 사용

posting