analysis of similarity measures between short text for the...

1
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Text Conversation Task Kozo Chikai and Yuki Arase Graduate school of information science and technology, Osaka University Introduction Goal Given an input post, search appropriate replies from the pool. Approach we compare the state-of-the-art methods for estimating text similarities to investigate their performance in handling short text, specially, under the scenario of short text conversation. System Design We implemented these five methods as Similarity Calculation. TF-IDF Weighted Text Matrix Factorization Word2vec → TF-IDF Random Forest → TF-IDF Random Forest TF-IDF Results 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 ndcg-1 nerr-5 accuracy2-1 accuracy2-5 accuracy12-1 accuracy12-5 Mean of labels Evaluation Method Result of experiments Oni-J-R3 (TF-IDF) Oni-J-R5 (WTMF) Oni-J-R4 (Word2Vec → TF-IDF) Oni-J-R1 (RandomForest → TF-IDF) Oni-J-R2 (RandomForest + TF-IDF) 0.3 0.5 0.7 0.9 rank1 rank2 rank3 rank4 rank5 Mean of labels Order Mean of labels for each rank Oni-J-R3 (TF-IDF) Oni-J-R5 (WTMF) Oni-J-R4 (Word2Vec → TF-IDF) Oni-J-R1 (RandomForest → TF-IDF) Oni-J-R2 (RandomForest + TF-IDF)

Upload: others

Post on 18-Jan-2021

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of Similarity Measures between Short Text for the ...research.nii.ac.jp/ntcir/workshop/OnlineProceedings12/...Analysis of Similarity Measures between Short Text for the NTCIR-12

AnalysisofSimilarityMeasuresbetweenShortTextforthe NTCIR-12ShortTextConversationTask

KozoChikai andYukiAraseGraduateschoolofinformationscienceandtechnology,OsakaUniversity

IntroductionGoalGivenaninputpost,searchappropriaterepliesfromthepool.

Approachwecomparethestate-of-the-artmethodsforestimatingtextsimilaritiestoinvestigatetheirperformanceinhandlingshorttext,specially,underthescenarioofshorttextconversation.

SystemDesignWeimplementedthesefivemethodsasSimilarityCalculation.

① TF-IDF② WeightedTextMatrixFactorization③ Word2vec→ TF-IDF④ RandomForest→ TF-IDF⑤ RandomForest+ TF-IDF

Results

00.050.10.150.20.250.30.350.4

ndcg-1 nerr-5 accuracy2-1 accuracy2-5 accuracy12-1 accuracy12-5

Meanoflabels

EvaluationMethod

Resultofexperiments

Oni-J-R3(①TF-IDF) Oni-J-R5(②WTMF)Oni-J-R4(③Word2Vec→TF-IDF) Oni-J-R1(④RandomForest→TF-IDF)Oni-J-R2(⑤RandomForest+TF-IDF)

0.3

0.5

0.7

0.9

rank1 rank2 rank3 rank4 rank5

Meanoflabe

ls

Order

Meanoflabelsforeachrank

Oni-J-R3(①TF-IDF) Oni-J-R5(②WTMF)Oni-J-R4(③Word2Vec→TF-IDF) Oni-J-R1(④RandomForest→TF-IDF)Oni-J-R2(⑤RandomForest+TF-IDF)