the geometry of learning

52
November 17th, 2009, Utrecht, The Netherlands The Geometry of Learning Fridolin Wild KMi, The Open University

Upload: fridolinwild

Post on 03-Nov-2014

7 views

Category:

Technology


3 download

DESCRIPTION

Latent Semantic Analysis (LSA) is a mathematical technique for computationally modeling the meaning of words and larger units of texts. LSA works by applying a mathematical technique called Singular Value Decomposition (SVD) to a term*document matrix containing frequency counts for all words found in the corpus in all of the documents or passages in the corpus. After this SVD application, the meaning of a word is represented as a vector in a multidimensional semantic space, which makes it possible to compare word meanings, for instance by computing the cosine between two word vectors.LSA has been successfully used in a large variety of language related applications from automatic grading of student essays to predicting click trails in website navigation. In Coh-Metrix (Graesser et al. 2004), a computational tool that produces indices of the linguistic and discourse representations of a text, LSA was used as a measure of text cohesion by assuming that cohesion increases as a function of higher cosine scores between adjacent sentences.Besides being interesting as a technique for building programs that need to deal with semantics, LSA is also interesting as a model of human cognition. LSA can match human performance on word association tasks and vocabulary test. In this talk, Fridolin will focus on LSA as a tool in modeling language acquisition. After framing the area of the talk with sketching the key concepts learning, information, and competence acquisition, and after outlining presuppositions, an introduction into meaningful interaction analysis (MIA) is given. MIA is a means to inspect learning with the support of language analysis that is geometrical in nature. MIA is a fusion of latent semantic analysis (LSA) combined with network analysis (NA/SNA). LSA, NA/SNA, and MIA are illustrated by several examples.

TRANSCRIPT

Page 1: The Geometry of Learning

November 17th, 2009, Utrecht, The Netherlands

The Geometry of Learning

Fridolin WildKMi, The Open University

Page 2: The Geometry of Learning

<2>

(created with http://www.wordle.net)

Page 3: The Geometry of Learning

<3>

Outline

Context & Framing Theories Latent Semantic Analysis (LSA) Social Network Analysis (SNA) Meaningful Interaction Analysis (MIA) Conclusion & Outlook

Page 4: The Geometry of Learning

<4>

Context & Theories

Page 5: The Geometry of Learning

<5>

Information

Information could be the quality of a certain signal.

Information could be a logical abstractor, the release mechanism.Knowledge could be the delta at the receiver (a paper, a human, a library).

(96dpi)

Information & Knowledge

Page 6: The Geometry of Learning

<6>

What is learning about?

Learning is changeLearning is about competence developmentCompetence becomes visible in performance

Professional competence is mainly about (re-)constructing and processing information and knowledge from cuesProfessional competence development is much about learning concepts from languageProfessional performance is much about demonstrating conceptual knowledge with language

Language!

Page 7: The Geometry of Learning

<7>

Non-textual concepts things we can’t (easily) learn from language

Tying shoelaces Douglas Adams’

‘meaning of liff’: Epping: The futile

movements of forefingers and eyebrows used when failing to attract the attention of waiters and barmen.

Shoeburyness: The vague uncomfortable feeling you get when sitting on a seat which is still warm from somebody else's bottomI have been

convincingly Sapir-Whorfed by

this book.

Page 8: The Geometry of Learning

<8>

Latent Semantic Analysis

Page 9: The Geometry of Learning

<9>

Word Choice

Educated adult understands ~100,000 word forms An average sentence contains 20 tokens. Thus 100,00020 possible combinations

of words in a sentence maximum of log2 100,00020

= 332 bits in word choice alone. 20! = 2.4 x 1018 possible orders of 20 words

= maximum of 61 bits from order of the words. 332/(61+ 332) = 84% word choice

(Landauer, 2007)

Page 10: The Geometry of Learning

<10>

Latent Semantic Analysis

“Humans learn word meanings and how to combine them into passage meaning through experience with ~paragraph unitized verbal environments.”

“They don’t remember all the separate words of a passage; they remember its overall gist or meaning.”

“LSA learns by ‘reading’ ~paragraph unitized texts that represent the environment.”

“It doesn’t remember all the separate words of a text it; it remembers its overall gist or meaning.”

(Landauer, 2007)

Page 11: The Geometry of Learning

<11>

Latent Semantics In other words:

Assumption: language utterances have a semantic structure Problem: structure is obscured by word usage

(noise, synonymy, polysemy, …) Solution: map doc-term matrix using conceptual indices

derived statistically (truncated SVD) and make similarity comparisons using angles

latent-semantic space

Page 12: The Geometry of Learning

<12>

Input (e.g., documents)

{ M } =

Deerwester, Dumais, Furnas, Landauer, and Harshman (1990): Indexing by Latent Semantic Analysis, In: Journal of the American Society for Information Science, 41(6):391-407

Only the red terms appear in more than one document, so strip the rest.

term = feature

vocabulary = ordered set of features

TEXTMATRIX

Page 13: The Geometry of Learning

<13>

Singular Value Decomposition

=

Page 14: The Geometry of Learning

<14>

Truncated SVD

… we will get a different matrix (different values, but still of the same format as M).

latent-semantic space

Page 15: The Geometry of Learning

<15>

The meaning of "life" =

0.0465 -0.0453 -0.0275 -0.0428 0.0166 -0.0142 -0.0094 0.0685 0.0297 -0.0377 -0.0166 -0.0165 0.0270 -0.0171 0.0017 0.0135 -0.0372 -0.0045 -0.0205 -0.0016 0.0215 0.0067 -0.0302 -0.0214 -0.0200 0.0462 -0.0371 0.0055 -0.0257 -0.0177

-0.0249 0.0292 0.0069 0.0098 0.0038 -0.0041 -0.0030 0.0021 -0.0114 0.0092 -0.0454 0.0151 0.0091 0.0021 -0.0079 -0.0283 -0.0116 0.0121 0.0077 0.0161 0.0401 -0.0015 -0.0268 0.0099 -0.0111 0.0101 -0.0106 -0.0105 0.0222 0.0106 0.0313 -0.0091 -0.0411 -0.0511 -0.0351 0.0072 0.0064 -0.0025 0.0392 0.0373 0.0107 -0.0063 -0.0006 -0.0033 -0.0403 0.0481 0.0082 -0.0587 -0.0154 -0.0342

-0.0057 -0.0141 0.0340 -0.0208 -0.0060 0.0165 -0.0139 0.0060 0.0249 -0.0515 0.0083 -0.0303 -0.0070 -0.0033 0.0408 0.0271 -0.0629 0.0202 0.0101 0.0080 0.0136 -0.0122 0.0107 -0.0130 -0.0035 -0.0103 -0.0357 0.0407 -0.0165 -0.0181 0.0369 -0.0295 -0.0262 0.0363 0.0309 0.0180 -0.0058 -0.0243 0.0038 -0.0480 0.0008 -0.0064 0.0152 0.0470 0.0071 0.0183 0.0106 0.0377 -0.0445 0.0206

-0.0084 -0.0457 -0.0190 0.0002 0.0283 0.0423 -0.0758 0.0005 0.0335 -0.0693 -0.0506 -0.0025 -0.1002 -0.0178 -0.0638 0.0513 -0.0599 -0.0456 -0.0183 0.0230 -0.0426 -0.0534 -0.0177 0.0383 0.0095 0.0117 0.0472 0.0319 -0.0047 0.0534 -0.0252 0.0266 -0.0210 -0.0627 0.0424 -0.0412 0.0133 -0.0221 0.0593 0.0506 0.0042 -0.0171 -0.0033 -0.0222 -0.0409 -0.0007 0.0265 -0.0260 -0.0052 0.0388 0.0393 0.0393 0.0652 0.0379 0.0463 0.0357 0.0462 0.0747 0.0244 0.0598

-0.0563 0.1011 0.0491 0.0174 -0.0123 0.0352 -0.0368 -0.0268 -0.0361 -0.0607 -0.0461 0.0437 -0.0087 -0.0109 0.0481 -0.0326 -0.0642 0.0367 0.0116 0.0048 -0.0515 -0.0487 -0.0300 0.0515 -0.0312 -0.0429 -0.0582 0.0730 -0.0063 -0.0479 0.0230 -0.0325 0.0240 -0.0086 -0.0401 0.0747 -0.0649 -0.0658 -0.0283 -0.0184

-0.0297 -0.0122 -0.0883 -0.0138 -0.0072 -0.0250 -0.1139 -0.0172 0.0507 0.0252 0.0307 -0.0821 0.0328 0.0584 -0.0216 0.0117 0.0801 0.0186 0.0088 0.0224

-0.0079 0.0462 -0.0273 -0.0792 0.0127 -0.0568 0.0105 -0.0167 0.0923 -0.0843 0.0836 0.0291 -0.0201 0.0807 0.0670 0.0592 0.0312 -0.0272 -0.0207 0.0028

-0.0092 0.0385 0.0194 -0.0451 0.0002 -0.0041 0.0203 0.0313 -0.0093 -0.0444 0.0142 -0.0458 0.0223 -0.0688 -0.0334 -0.0361 -0.0636 0.0217 -0.0153 -0.0458

-0.0322 -0.0615 -0.0206 0.0146 -0.0002 0.0148 -0.0223 0.0471 -0.0015 0.0135

(Landauer, 2007)

Page 16: The Geometry of Learning

<16>

Reconstructed, Reduced Matrix

m4: Graph minors: A survey

Page 17: The Geometry of Learning

<17>

Similarity in a Latent-Semantic Space

(Landauer, 2007)

m

ii

m

ii

m

iii

ba

ba

1

2

1

2

1cos Query

Target 1

Target 2Angle 2

Angle 1

Y di

men

sion

X dimension

Page 18: The Geometry of Learning

<18>

doc2doc - similarities

Unreduced = pure vector space model- Based on M = TSD’- Pearson Correlation over document vectors

reduced- based on M2 = TS2D’- Pearson Correlation over document vectors

Page 19: The Geometry of Learning

<19>

Typical, simple workflow

tm = textmatrix(‘dir/‘)

tm = lw_logtf(tm) * gw_idf(tm)

space = lsa(tm, dims=dimcalc_share())

tm3 = fold_in(tm, space)

as.textmatrix(tm)

Page 20: The Geometry of Learning

<20>

Processing Pipeline (with Options)

4 x 12 x 7 x 2 x 3 = 2016 Combinations

Page 21: The Geometry of Learning

<21>

b) SVD is computationally expensiveFrom seconds (lower hundreds of documents, optimised linear algebra libraries, truncated SVD)To minutes (hundreds to thousands of documents)To hours (tens and hundreds of thousands)

a) SVD factor stabilitySVD calculates factors over a given text base; different texts – different factorsProblem: avoid unwanted factor changesSolution: folding-in of instead of recalculating

Projecting by Folding-In

Page 22: The Geometry of Learning

<22>

Folding-In in Detail

1 kkT

i STvd1

Tikki dSTm

2

vT

Tk Sk Dk

Mk

(cf. Berry et al., 1995)

(1) convertOriginalVector to„Dk“-format

(2) convert„Dk“-formatvector to„Mk“-format

Page 23: The Geometry of Learning

<23>

The Value of Singular Values

Pearson(eu, österreich) Pearson(jahr, wien)

Page 24: The Geometry of Learning

<24>

Simple LSA application

Page 25: The Geometry of Learning

<25>

Summary Writing: Working Principle

(Landauer, 2007)

Page 26: The Geometry of Learning

<26>

Summary Writing

Gold Standard 1

Essay 1

Essay 2Y

dim

ensi

on

X dimension

Gold Standard 3Gold Standard 2

Page 27: The Geometry of Learning

<27>

‘Dumb’ Summary Writing (Code)library( "lsa“ ) # load package

# load training texts

trm = textmatrix( "trainingtexts/“ )trm = lw_bintf( trm ) * gw_idf( trm ) # weightingspace = lsa( trm ) # create an LSA space

# fold-in summaries to be tested (including gold standard text)tem = textmatrix( "testessays/", vocabulary=rownames(trm) )tem_red = fold_in( tem, space )

# score a summary by comparing with # gold standard text (very simple method!)

cor( tem_red[,"goldstandard.txt"], tem_red[,"E1.txt"] )=> 0.7

Page 28: The Geometry of Learning

<28>

Evaluating Effectiveness

Compare Machine Scores with Human Scores

Human-to-Human Correlation Usually around .6 Increased by familiarity between

assessors, tighter assessment schemes, … Scores vary even stronger with decreasing

subject familiarity (.8 at high familiarity, worst test -.07)

• Test Collection: 43 German Essays, scored from 0 to 5 points (ratio scaled), average length: 56.4 words• Training Collection: 3 ‘golden essays’, plus 302 documents from a marketing glossary, average length: 56.1

words

Page 29: The Geometry of Learning

<29>

(Positive) Evaluation Results

LSA machine scores: Spearman's rank correlation rhodata: humanscores[names(machinescores), ] and machinescores S = 914.5772, p-value = 0.0001049alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.687324

Pure vector space model: Spearman's rank correlation rhodata: humanscores[names(machinescores), ] and machinescores S = 1616.007, p-value = 0.02188alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.4475188

Page 30: The Geometry of Learning

<30>

(S)NA

Page 31: The Geometry of Learning

<31>

Social Network Analysis

Existing for a long time (term coined 1954) Basic idea:

Actors and Relationships between them (e.g. Interactions)

Actors can be people (groups, media, tags, …) Actors and Ties form a Graph (edges and nodes) Within that graph, certain structures can be

investigated • Betweenness, Degree of Centrality, Density, Cohesion

• Structural Patterns can be identified (e.g. the Troll)

Page 32: The Geometry of Learning

<32>

Forum Messages

  message_id forum_id parent_id author

130 2853483 2853445 \N 2043

131 1440740 785876 \N 1669

132 2515257 2515256 \N 5814

133 4704949 4699874 \N 5810

134 2597170 2558273 \N 2054

135 2316951 2230821 \N 5095

136 3407573 3407568 \N 36

137 2277393 2277387 \N 359

138 3394136 3382201 \N 1050

139 4603931 4167338 \N 453

140 6234819 6189254 6231352 5400

141 806699 785877 804668 2177

142 4430290 3371246 3380313 48

143 3395686 3391024 3391129 35

144 6270213 6024351 6265378 5780

145 2496015 2491522 2491536 2774

146 4707562 4699873 4707502 5810

147 2574199 2440094 2443801 5801

148 4501993 4424215 4491650 5232

  message_id forum_id parent_id author

60 734569 31117 \N 2491

221 762702 31117   1

317 762717 31117 762702 1927

1528 819660 31117 793408 1197

1950 840406 31117 839998 1348

1047 841810 31117 767386 1879

2239 862709 31117 \N 1982

2420 869839 31117 862709 2038

2694 884824 31117 \N 5439

2503 896399 31117 862709 1982

2846 901691 31117 895022 992

3321 951376 31117 \N 5174

3384 952895 31117 951376 1597

1186 955595 31117 767386 5724

3604 958065 31117 \N 716

2551 960734 31117 862709 1939

4072 975816 31117 \N 584

2574 986038 31117 862709 2043

2590 987842 31117 862709 1982

Page 33: The Geometry of Learning

<33>

Incidence Matrix

msg_id = incident, authors appear in incidents

Page 34: The Geometry of Learning

<34>

Derive Adjacency Matrix

= t(im) %*% im

Page 35: The Geometry of Learning

<35>

Visualization: Sociogramme

Page 36: The Geometry of Learning

<36>

Measuring Techniques (Sample)

Degree Centralitynumber of (in/out) connections to others

Closenesshow close to all others

Betweennesshow often intermediary

Componentse.g. kmeans cluster (k=3)

Page 37: The Geometry of Learning

<37>

SNA applications

Page 38: The Geometry of Learning

<38>

Co-Authorship Network WI (2005)

Page 39: The Geometry of Learning

<39>

Paper Collaboration Prolearn

e.g. co-authorships of ~30 deliverables of three work packages (ProLearn NoE)

Roles: reviewer (red), editor (green), contributor

Size: Prestige() But: type of

interaction? Content of interaction? => not possible!

Page 40: The Geometry of Learning

<40>

TEL Project Cooperation (2004-2007)

Page 41: The Geometry of Learning

<41>

iCamp Collaboration (Y1)

Shades of yellow: WP leadershipRed: coordinator

Page 42: The Geometry of Learning

<42>

MIA

Page 43: The Geometry of Learning

<43>

Meaningful Interaction Analysis (MIA)

Fusion: Combining LSA with SNA Terms and Documents (or anything else represented

with column vectors or row vectors) are mapped into same space by LSA

Semantic proximity can be measured between them: how close is a term to a document?

(S)NA allows to analyse these resulting graph structures

By e.g. cluster or component analysis By e.g. identifying central descriptors for these

Page 44: The Geometry of Learning

<44>

The mathemagics behindMeaning Interaction Analysis

Page 45: The Geometry of Learning

<45>

Truncated SVD

… we will get a different matrix (different values, but still of the same format as M).

latent-semantic space

Page 46: The Geometry of Learning

<46>

Knowledge Proxy: LSA Part

Tk = left-hand sided matrix = ‚term loadings‘ on the singular value

Dk = right-hand sided matrix = ‚document loadings‘ on the singular value

Multiply them into same space VT = Tk Sk

VD = DkT

Sk

Cosine Distance Matrix over ... = a graph

Extension: add author vectors VA through cluster centroids or vector addition of their publication vectors

latent-semantic space

DT VV

ADT VVV

Of course:use existing space and fold inthe whole sets of vectors

Page 47: The Geometry of Learning

<47>

Knowledge Proxy: SNA Part:Filter the Network

Every vector has a cosine distance to everyother (may be negative)!

So: filter for the desired similarity strength

Page 48: The Geometry of Learning

<48>

ConSpectmonitoring conceptual development

Page 49: The Geometry of Learning

<49>

Page 50: The Geometry of Learning

<50>

TopicProxy (30 people, 2005)

Page 51: The Geometry of Learning

<51>

Bringing together what belongs together

Spot unwanted fragmentatione.g. two authors work on the same topic, but with different

collaborator groups and with different literature

Intervention Instrument: automatically recommend

to hold a flashmeeting

Wild, Ochoa, Heinze, Crespo, Quick (2009, to appear)

Page 52: The Geometry of Learning

<52>

//eof.