a potential approach to overcome data limitation in scientific publication recommendation
TRANSCRIPT
APotentialApproachtoOvercomeDataLimitationinScientificPublicationRecommendation
HungNghiepTran,TinHuynh,KiemHoang
UniversityofInformationTechnology
Vietnam
Originalpaper:HungNghiepTran,TinHuynh,KiemHoang.APotentialApproachtoOvercomeDataLimitationinScientificPublicationRecommendation.KSE2015.Resource: Seethelastslide(SlideShareconvention.)
IntroductionRelevantpaperrecommendation:Whatpapersarerelevanttoaresearcher’sinterests.
§ Note:Differentfromcitationrecommendation.
2Imagesource:SugiyamaandKan,Exploitingpotentialcitationpapers inscholarlypaperrecommendation,JCDL‘13.
àProblemsinexperimentsandevaluations.
IntroductionRelevantpaperrecommendation:Whatpapersarerelevanttoaresearcher’sinterests.
§ Note:Differentfromcitationrecommendation.
3Imagesource:SugiyamaandKan,Exploitingpotentialcitationpapers inscholarlypaperrecommendation,JCDL‘13.
àProblemsinexperimentsandevaluations.
IntroductionRelevantpaperrecommendation:Whatpapersarerelevanttoaresearcher’sinterests.
§ Note:Differentfromcitationrecommendation.
4Imagesource:SugiyamaandKan,Exploitingpotentialcitationpapers inscholarlypaperrecommendation,JCDL‘13.
àProblemsinexperimentsandevaluations.
Introduction
Offlineevaluation:Themostpopularapproach.§ Basedongroundtruthdata.• Recommendedpapersarecomparedtotheonesknowntoberelevanttoeachresearcher.
àBuildinggroundtruthdataisusuallydifficultandexpensive.
5
Introduction
Offlineevaluation:Themostpopularapproach.§ Basedongroundtruthdata.• Recommendedpapersarecomparedtotheonesknowntoberelevanttoeachresearcher.
àBuildinggroundtruthdataisusuallydifficultandexpensive.
6
Introduction
• Anapproachtobuildgroundtruth:§ Naturalthinking,referencesarerelevanttoresearchers’interests.
àIntuitively,wecanbuiltgroundtruthdatabasedonreferecedata.
àButthisapproachisnotexploredintheliterature.
7
Introduction
• Anapproachtobuildgroundtruth:§ Naturalthinking,referencesarerelevanttoresearchers’interests.
àIntuitively,wecanbuiltgroundtruthdatabasedonreferecedata.
àButthisapproachisnotexploredintheliterature.
8
Introduction
• Anapproachtobuildgroundtruth:§ Naturalthinking,referencesarerelevanttoresearchers’interests.
àIntuitively,wecanbuiltgroundtruthdatabasedonreferecedata.
àButthisapproachisnotexploredintheliterature.
9
OurGoal
To systematically study the approach thatbuilds ground truth data based onreference data for evaluation of relevantpaper recommendation.
10
RelatedWork• Relevantpaperrecommendationisanemergedresearcharea:§ [Sugiyama&Kan,JCDL‘10].§ [Leetal.,ICCCI‘14].§ [Ohtaetal.,ICADIWT‘11].
• Lackofgroundtruthdataisarecognizedproblem:§ [Beeletal,RepSysWorkshop‘13].
• Someapproacheshavebeentriedtobuilddata:§ Manuallybuilt:
• [Sugiyama&Kan,JCDL‘10].§ Adaptedfromreferencemanagementsoftware:
• Mendeley.• Docear.
11
OurApproach
• Theoreticalanalysis:§ Proposeandanalyzethehypothesessupportingtheapproach.
• Empiricalanalysis:§ Evaluatetheapproach’scapabilityofevaluatingrecommendationmethods.
12
TheoreticalAnalysis
Hypotheses:• Hypothesis1.Inthecontextofdoingresearchandwriting
scientificpublications,therearemanylevelsofexposingaresearcher’sinformationneedsinwhichcitingisthehighestlevel.
• Hypothesis2.Referencesmadebyresearchersaretheirrelevantpublications.Moreover,theyarethemostimportantones.
• Hypothesis3.Futurereferencescouldbeusedasgroundtruthdata inevaluationofrecommendingrelevantpublicationsforresearchers.
àTheapproachisreasonable.13
TheoreticalAnalysis
Hypotheses:• Hypothesis1.Inthecontextofdoingresearchandwriting
scientificpublications,therearemanylevelsofexposingaresearcher’sinformationneedsinwhichcitingisthehighestlevel.
• Hypothesis2.Referencesmadebyresearchersaretheirrelevantpublications.Moreover,theyarethemostimportantones.
• Hypothesis3.Futurereferencescouldbeusedasgroundtruthdata inevaluationofrecommendingrelevantpublicationsforresearchers.
àTheapproachisreasonable.14
EmpiricalAnalysis
15
Howto evaluate theapproach’scapabilityofevaluating recommendationmethods?à Thetrick:Two-layerevaluation.
1. Evaluatedifferentrecommendationmethods,gettherecommendationevaluationresults.
2. Evaluatetheconsistencyoftheaboveevaluationresultsontwodatasets,theonebuiltbasedontheapproachandtheonebuiltmanually.
EmpiricalAnalysis
16
Howto evaluate theapproach’scapabilityofevaluating recommendationmethods?à Thetrick:Two-layerevaluation.
1. Evaluatedifferentrecommendationmethods,gettherecommendationevaluationresults.
2. Evaluatetheconsistencyoftheaboveevaluationresultsontwodatasets,theonebuiltbasedontheapproachandtheonebuiltmanually.
EmpiricalAnalysis
17
Howto evaluate theapproach’scapabilityofevaluating recommendationmethods?à Thetrick:Two-layerevaluation.
1. Evaluatedifferentrecommendationmethods,gettherecommendationevaluationresults.
2. Evaluatetheconsistencyoftheaboveevaluationresultsontwodatasets,theonebuiltbasedontheapproachandtheonebuiltmanually.
EmpiricalAnalysis
18
Howto evaluate theapproach’scapabilityofevaluating recommendationmethods?à Thetrick:Two-layerevaluation.
1. Evaluatedifferentrecommendationmethods,gettherecommendationevaluationresults.
2. Evaluatetheconsistencyoftheaboveevaluationresultsontwodatasets,theonebuiltbasedontheapproachandtheonebuiltmanually.
ExperimentsPlan
19
1. Build a dataset D with ground truthdata based on the approach.
2. Get a manually built dataset D’ forcomparison.
3. Recommend by different methods onD and D’.
4. Evaluate recommendation methods’result on D and D’.
5. Evaluate the consistency of evaluationresults on D and D’.
ExperimentsPlan
20Imagesource:SugiyamaandKan,Exploitingpotentialcitationpapers inscholarlypaperrecommendation,JCDL‘13.
1. Build a dataset D with ground truthdata based on the approach.
2. Get a manually built dataset D’ forcomparison.
3. Recommend by different methods onD and D’.
4. Evaluate recommendation methods’result on D and D’.
5. Evaluate the consistency of evaluationresults on D and D’.
DifferentContent-basedFiltering recommendationmethods formedbyfeaturevectorcombination.
Evaluation
Layer1:Toevaluaterecommendationresults:§ Order-aware:NDCG@5,NDCG@10.§ Firstrelevantitem-aware:MRR.
Layer2:Tomeasuretheconsistencyofevaluationresults:
§ Pearson’scoefficient.§ Spearman’scoefficient.§ Kendall’scoefficient.
21
Evaluation
Layer1:Toevaluaterecommendationresults:§ Order-aware:NDCG@5,NDCG@10.§ Firstrelevantitem-aware:MRR.
Layer2:Tomeasuretheconsistencyofevaluationresults:
§ Pearson’scoefficient.§ Spearman’scoefficient.§ Kendall’scoefficient.
22
TheProcesstoBuildDataset
• Timeline
• TargetResearcher: thoseonesforwhomrecommendationsaregenerated.
• FutureReference:thosepaperscitedintheFuturebutnotinthePast.
• GroundTruthData:futurereferencescitedbyeachtargetresearcher.
23
FuturePastPresent(Defined)
Firstpublished year Lastpublished year
TheProcesstoBuildDataset
• Timeline
• TargetResearcher: thoseonesforwhomrecommendationsaregenerated.
• FutureReference:thosepaperscitedintheFuturebutnotinthePast.
• GroundTruthData:futurereferencescitedbyeachtargetresearcher.
24
FuturePastPresent(Defined)
Firstpublished year Lastpublished year
ExperimentalData
• D:Automaticallybuiltdataset.§ Tobereleasedfromhttps://sites.google.com/site/tranhungnghiep
• D’:Manuallybuiltdataset.§ FromSugiyamaandKan,JCDL‘10.
25
26
Layer1: RecommendationEvaluationResults
EvaluationresultofCBFmethodson2datasetsDandD’:
Layer2: EvaluationResultsConsistency
27
• Ingeneral:statisticallysignificantstrongpositivecorrelatedresults.à Applicabilityinroughlycomparingmethodsbeforeonlineevaluation.• Forspecificmetric:
§ ForNDCG@10:lesscorrelation.§ ForMRR:nocorrelation.
à Notsuitabletomeasurethefirstrelevantrecommendeditem.
CorrelationsbetweenevaluationresultsonDandD’:
Layer2: EvaluationResultsConsistency
28
• Ingeneral:statisticallysignificantstrongpositivecorrelatedresults.à Applicabilityinroughlycomparingmethodsbeforeonlineevaluation.• Forspecificmetric:
§ ForNDCG@10:lesscorrelation.§ ForMRR:nocorrelation.
à Notsuitabletomeasurethefirstrelevantrecommendeditem.
CorrelationsbetweenevaluationresultsonDandD’:
Layer2: EvaluationResultsConsistency
29
• Ingeneral:statisticallysignificantstrongpositivecorrelatedresults.à Applicabilityinroughlycomparingmethodsbeforeonlineevaluation.• Forspecificmetric:
§ ForNDCG@10:lesscorrelation.§ ForMRR:nocorrelation.
à Notsuitabletomeasurethefirstrelevantrecommendeditem.
CorrelationsbetweenevaluationresultsonDandD’:
Conclusion
Thisstudyisthefirstoneto:• Assesstheapproachbuildinggroundtruthdatabasedonreferencedataforevaluatingrelevantpaperrecommendation.àWeshowedthatthisapproachispromising.
• Proposeaprocesstobuildgroundtruthdatafrombibliographicdata.àWebuiltandpublishedadatasettohelpadvancingotherresearches(*).
30
Conclusion
Thisstudyisthefirstoneto:• Assesstheapproachbuildinggroundtruthdatabasedonreferencedataforevaluatingrelevantpaperrecommendation.àWeshowedthatthisapproachispromising.
• Proposeaprocesstobuildgroundtruthdatafrombibliographicdata.àWebuiltandpublishedadatasettohelpadvancingotherresearches(*).
31
Conclusion
Thisstudyisthefirstoneto:• Assesstheapproachbuildinggroundtruthdatabasedonreferencedataforevaluatingrelevantpaperrecommendation.àWeshowedthatthisapproachispromising.
• Proposeaprocesstobuildgroundtruthdatafrombibliographicdata.àWebuiltandpublishedadatasettohelpadvancingotherresearches(*).
32(*)Tobereleasedfromhttps://sites.google.com/site/tranhungnghiep
Futurework
• Shouldfocusonextensiveassessmentofthisapproach.àEspecially,bycomparisonwithonlineevaluation.
Thankyouverymuch!
Futurework
• Shouldfocusonextensiveassessmentofthisapproach.àEspecially,bycomparisonwithonlineevaluation.
Thankyouverymuch!
Originalpaper:HungNghiepTran,TinHuynh,Kiem Hoang.APotentialApproachtoOvercomeDataLimitationinScientificPublicationRecommendation.KSE2015.Code&Data: https://github.com/tranhungnghiep/PaperRecommender.Otherresource: https://sites.google.com/site/tranhungnghiep/code-data/paper-recommender-systems.