timo honkela: spaces of knowledge

17
Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016 Timo Honkela Modeling Meaning and Knowledge 1 Feb 2016 [email protected] Spaces of Knowledge

Upload: timo-honkela

Post on 13-Apr-2017

275 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Timo Honkela: Spaces of Knowledge

Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016

Timo Honkela

Modeling Meaning and Knowledge1 Feb 2016

[email protected]

Spaces of Knowledge

Page 2: Timo Honkela: Spaces of Knowledge

Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016

http://www.cs.cornell.edu/Info/Department/Annual95/Faculty/Salton.html

Advent of vector-based information retrieval

● Gerarg Salton: Documents and queries represented as vectors of term counts

● Similarity between a document and a query is given by the cosine between the term vector and the document vector

● TF-IDF (term-frequency-inverse-document frequency) for weighting of a term in a document

● Inverse document frequency had been introduced by Karen Spärck-Jones in 1972

https://en.wikipedia.org/wiki/Gerard_Salton

https://en.wikipedia.org/wiki/Karen_Sp%C3%A4rck_Jones

Page 3: Timo Honkela: Spaces of Knowledge

Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016

University

Society

D

D

DQ Q

Q

1

1

2

2

3

3

Document 1: The word “university” appears three times and “society” once, etc.

Query 1: “university”

https://en.wikipedia.org/wiki/Cosine_similarity

https://en.wikipedia.org/wiki/Sine

Page 4: Timo Honkela: Spaces of Knowledge

Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016

Contexts tell about meaning

● John Rupert Firth: “You shall know a word by the company it keeps”

● Ludwig Wittgenstein: “For a large class of cases of the employment of the word ‘meaning’—though not for all—this way can be explained in this way: the meaning of a word is its use in the language” (PI 43)

https://en.wikipedia.org/wiki/John_Rupert_Firth

http://plato.stanford.edu/entries/wittgenstein/#Mea

https://en.wikipedia.org/wiki/Ludwig_Wittgenstein

Page 5: Timo Honkela: Spaces of Knowledge

Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016

Analysis of term-document matrices

● The same idea as in information retrieval can also be applied in studying words and expressions

● Statistical analysis of document-term matrices gives rise to models of relationship between words or documents

● Classical examples include – Latent Semantic Analysis (Deerwester, Dumais et al. 1988)

– Self-Organizing Semantic Maps (Ritter & Kohonen 1989)

Page 6: Timo Honkela: Spaces of Knowledge

Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016

Word spaces, clusters, clouds, ...

● The analysis of the statistical information related to word contexts can be turned into visualizations of the word relations

Page 7: Timo Honkela: Spaces of Knowledge

Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016

Maps of words in Grimm fairy tales

Honkela, Pulkki & Kohonen 1995

Automated learning of word re

lations

using self-organizing m

ap on text c

ontext data

Page 8: Timo Honkela: Spaces of Knowledge

Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016

Chemistry

Natural sciencesand engineering

Bio- andenvironmentalsciences

Health

Culture andsociety

Map of Finnish Science

(T. Honkela & M. Klami 2007)

Page 9: Timo Honkela: Spaces of Knowledge

Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016

From term weightingto term selection

● TF-IDF is a widely used method for term weighting

● Likey (Language Independent Keyphrase Extraction) was developed to select terms automally by camparing the corpus at hand with another corpus, called a reference corpus(Paukkeri et al. 2008, Paukkeri & Honkela 2010)

Page 10: Timo Honkela: Spaces of Knowledge

Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016

1. the  12768472. of  10679183. and    8178524. in    6253305. to    3574536. for   2253077. is    2057238. on    1625099. research 15725110. be    15147511. with    13685412. will    13599213. as      12270714. are    11650815. by   11387816. university 98003...

1. the  20236172. of    9456223. to    8832064. and    7177185. in    6114216. that    4737397. a    4457758. is    4451199. we    30559010. for    29609211. i     29041212. this    28692413. on    27461414. it    25134315. be    24691716. are    197082...

Most frequent word forms (types) intwo corpora

Academycorpus

Europarlcorpus

Page 11: Timo Honkela: Spaces of Knowledge

Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016

Doc

umen

ts

Terms

SOM

Document map

Likey

Referencecorpus

(EU partiament)

Academycorpus

Term list

Page 12: Timo Honkela: Spaces of Knowledge

Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016

Extralinguistic contexts

● Human beings learn language in real world contexts that include visual, tactile, etc. perceptions

● In order to model meaning in a human-like manner, these other modalities have to be taken into account

● In a project called “Multimodally Grounded Language Technology” we associated visual patterns of human movements with expressions that had been used to describe these movements

Page 13: Timo Honkela: Spaces of Knowledge

Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016

RUNNING

WALKING

LIMPING

JOGGING

Page 14: Timo Honkela: Spaces of Knowledge

Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016

Modeling subjectivityof meaning

● In our method Grounded Intersubjective Concept Analysis (GICA), we added a new “dimension” to the term-document matrices

● We did not assume that each person understands and uses every word in a similar manner but wanted to model the personal variation

● This was achieved by using Subject-Object-Context tensors (Honkela et al. 2012)

Page 15: Timo Honkela: Spaces of Knowledge

Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016

GICA: Grounded IntersubjectiveConcept Analysis

Honkela, Raitio, Lagus & Nieminen 2012

Page 16: Timo Honkela: Spaces of Knowledge

Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016

Analysis of “health” in theState of the Union addresses

Subjects on objects in contexts: Using GICA method to quantify epistemological subjectivity. Timo Honkela, Juha Raitio, Krista Lagus, Ilari T. Nieminen, Nina Honkela, and Mika Pantzar.Proc. of IJCNN 2012.

Page 17: Timo Honkela: Spaces of Knowledge

Timo Honkela, Modeling Meaning and Knowledge, Spaces of Knowledge, 1.2.2016

Thank you foryou attention!