a potential approach to overcome data limitation in scientific publication recommendation

34
A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation Hung Nghiep Tran, Tin Huynh, Kiem Hoang University of Information Technology Vietnam Original paper: Hung Nghiep Tran, Tin Huynh, Kiem Hoang. A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation. KSE 2015. Resource: See the last slide (SlideShare convention.)

Upload: hung-nghiep-tran

Post on 08-Feb-2017

204 views

Category:

Science


0 download

TRANSCRIPT

Page 1: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

APotentialApproachtoOvercomeDataLimitationinScientificPublicationRecommendation

HungNghiepTran,TinHuynh,KiemHoang

UniversityofInformationTechnology

Vietnam

Originalpaper:HungNghiepTran,TinHuynh,KiemHoang.APotentialApproachtoOvercomeDataLimitationinScientificPublicationRecommendation.KSE2015.Resource: Seethelastslide(SlideShareconvention.)

Page 2: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

IntroductionRelevantpaperrecommendation:Whatpapersarerelevanttoaresearcher’sinterests.

§ Note:Differentfromcitationrecommendation.

2Imagesource:SugiyamaandKan,Exploitingpotentialcitationpapers inscholarlypaperrecommendation,JCDL‘13.

àProblemsinexperimentsandevaluations.

Page 3: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

IntroductionRelevantpaperrecommendation:Whatpapersarerelevanttoaresearcher’sinterests.

§ Note:Differentfromcitationrecommendation.

3Imagesource:SugiyamaandKan,Exploitingpotentialcitationpapers inscholarlypaperrecommendation,JCDL‘13.

àProblemsinexperimentsandevaluations.

Page 4: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

IntroductionRelevantpaperrecommendation:Whatpapersarerelevanttoaresearcher’sinterests.

§ Note:Differentfromcitationrecommendation.

4Imagesource:SugiyamaandKan,Exploitingpotentialcitationpapers inscholarlypaperrecommendation,JCDL‘13.

àProblemsinexperimentsandevaluations.

Page 5: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

Introduction

Offlineevaluation:Themostpopularapproach.§ Basedongroundtruthdata.• Recommendedpapersarecomparedtotheonesknowntoberelevanttoeachresearcher.

àBuildinggroundtruthdataisusuallydifficultandexpensive.

5

Page 6: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

Introduction

Offlineevaluation:Themostpopularapproach.§ Basedongroundtruthdata.• Recommendedpapersarecomparedtotheonesknowntoberelevanttoeachresearcher.

àBuildinggroundtruthdataisusuallydifficultandexpensive.

6

Page 7: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

Introduction

• Anapproachtobuildgroundtruth:§ Naturalthinking,referencesarerelevanttoresearchers’interests.

àIntuitively,wecanbuiltgroundtruthdatabasedonreferecedata.

àButthisapproachisnotexploredintheliterature.

7

Page 8: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

Introduction

• Anapproachtobuildgroundtruth:§ Naturalthinking,referencesarerelevanttoresearchers’interests.

àIntuitively,wecanbuiltgroundtruthdatabasedonreferecedata.

àButthisapproachisnotexploredintheliterature.

8

Page 9: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

Introduction

• Anapproachtobuildgroundtruth:§ Naturalthinking,referencesarerelevanttoresearchers’interests.

àIntuitively,wecanbuiltgroundtruthdatabasedonreferecedata.

àButthisapproachisnotexploredintheliterature.

9

Page 10: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

OurGoal

To systematically study the approach thatbuilds ground truth data based onreference data for evaluation of relevantpaper recommendation.

10

Page 11: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

RelatedWork• Relevantpaperrecommendationisanemergedresearcharea:§ [Sugiyama&Kan,JCDL‘10].§ [Leetal.,ICCCI‘14].§ [Ohtaetal.,ICADIWT‘11].

• Lackofgroundtruthdataisarecognizedproblem:§ [Beeletal,RepSysWorkshop‘13].

• Someapproacheshavebeentriedtobuilddata:§ Manuallybuilt:

• [Sugiyama&Kan,JCDL‘10].§ Adaptedfromreferencemanagementsoftware:

• Mendeley.• Docear.

11

Page 12: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

OurApproach

• Theoreticalanalysis:§ Proposeandanalyzethehypothesessupportingtheapproach.

• Empiricalanalysis:§ Evaluatetheapproach’scapabilityofevaluatingrecommendationmethods.

12

Page 13: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

TheoreticalAnalysis

Hypotheses:• Hypothesis1.Inthecontextofdoingresearchandwriting

scientificpublications,therearemanylevelsofexposingaresearcher’sinformationneedsinwhichcitingisthehighestlevel.

• Hypothesis2.Referencesmadebyresearchersaretheirrelevantpublications.Moreover,theyarethemostimportantones.

• Hypothesis3.Futurereferencescouldbeusedasgroundtruthdata inevaluationofrecommendingrelevantpublicationsforresearchers.

àTheapproachisreasonable.13

Page 14: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

TheoreticalAnalysis

Hypotheses:• Hypothesis1.Inthecontextofdoingresearchandwriting

scientificpublications,therearemanylevelsofexposingaresearcher’sinformationneedsinwhichcitingisthehighestlevel.

• Hypothesis2.Referencesmadebyresearchersaretheirrelevantpublications.Moreover,theyarethemostimportantones.

• Hypothesis3.Futurereferencescouldbeusedasgroundtruthdata inevaluationofrecommendingrelevantpublicationsforresearchers.

àTheapproachisreasonable.14

Page 15: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

EmpiricalAnalysis

15

Howto evaluate theapproach’scapabilityofevaluating recommendationmethods?à Thetrick:Two-layerevaluation.

1. Evaluatedifferentrecommendationmethods,gettherecommendationevaluationresults.

2. Evaluatetheconsistencyoftheaboveevaluationresultsontwodatasets,theonebuiltbasedontheapproachandtheonebuiltmanually.

Page 16: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

EmpiricalAnalysis

16

Howto evaluate theapproach’scapabilityofevaluating recommendationmethods?à Thetrick:Two-layerevaluation.

1. Evaluatedifferentrecommendationmethods,gettherecommendationevaluationresults.

2. Evaluatetheconsistencyoftheaboveevaluationresultsontwodatasets,theonebuiltbasedontheapproachandtheonebuiltmanually.

Page 17: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

EmpiricalAnalysis

17

Howto evaluate theapproach’scapabilityofevaluating recommendationmethods?à Thetrick:Two-layerevaluation.

1. Evaluatedifferentrecommendationmethods,gettherecommendationevaluationresults.

2. Evaluatetheconsistencyoftheaboveevaluationresultsontwodatasets,theonebuiltbasedontheapproachandtheonebuiltmanually.

Page 18: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

EmpiricalAnalysis

18

Howto evaluate theapproach’scapabilityofevaluating recommendationmethods?à Thetrick:Two-layerevaluation.

1. Evaluatedifferentrecommendationmethods,gettherecommendationevaluationresults.

2. Evaluatetheconsistencyoftheaboveevaluationresultsontwodatasets,theonebuiltbasedontheapproachandtheonebuiltmanually.

Page 19: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

ExperimentsPlan

19

1. Build a dataset D with ground truthdata based on the approach.

2. Get a manually built dataset D’ forcomparison.

3. Recommend by different methods onD and D’.

4. Evaluate recommendation methods’result on D and D’.

5. Evaluate the consistency of evaluationresults on D and D’.

Page 20: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

ExperimentsPlan

20Imagesource:SugiyamaandKan,Exploitingpotentialcitationpapers inscholarlypaperrecommendation,JCDL‘13.

1. Build a dataset D with ground truthdata based on the approach.

2. Get a manually built dataset D’ forcomparison.

3. Recommend by different methods onD and D’.

4. Evaluate recommendation methods’result on D and D’.

5. Evaluate the consistency of evaluationresults on D and D’.

DifferentContent-basedFiltering recommendationmethods formedbyfeaturevectorcombination.

Page 21: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

Evaluation

Layer1:Toevaluaterecommendationresults:§ Order-aware:NDCG@5,NDCG@10.§ Firstrelevantitem-aware:MRR.

Layer2:Tomeasuretheconsistencyofevaluationresults:

§ Pearson’scoefficient.§ Spearman’scoefficient.§ Kendall’scoefficient.

21

Page 22: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

Evaluation

Layer1:Toevaluaterecommendationresults:§ Order-aware:NDCG@5,NDCG@10.§ Firstrelevantitem-aware:MRR.

Layer2:Tomeasuretheconsistencyofevaluationresults:

§ Pearson’scoefficient.§ Spearman’scoefficient.§ Kendall’scoefficient.

22

Page 23: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

TheProcesstoBuildDataset

• Timeline

• TargetResearcher: thoseonesforwhomrecommendationsaregenerated.

• FutureReference:thosepaperscitedintheFuturebutnotinthePast.

• GroundTruthData:futurereferencescitedbyeachtargetresearcher.

23

FuturePastPresent(Defined)

Firstpublished year Lastpublished year

Page 24: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

TheProcesstoBuildDataset

• Timeline

• TargetResearcher: thoseonesforwhomrecommendationsaregenerated.

• FutureReference:thosepaperscitedintheFuturebutnotinthePast.

• GroundTruthData:futurereferencescitedbyeachtargetresearcher.

24

FuturePastPresent(Defined)

Firstpublished year Lastpublished year

Page 25: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

ExperimentalData

• D:Automaticallybuiltdataset.§ Tobereleasedfromhttps://sites.google.com/site/tranhungnghiep

• D’:Manuallybuiltdataset.§ FromSugiyamaandKan,JCDL‘10.

25

Page 26: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

26

Layer1: RecommendationEvaluationResults

EvaluationresultofCBFmethodson2datasetsDandD’:

Page 27: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

Layer2: EvaluationResultsConsistency

27

• Ingeneral:statisticallysignificantstrongpositivecorrelatedresults.à Applicabilityinroughlycomparingmethodsbeforeonlineevaluation.• Forspecificmetric:

§ ForNDCG@10:lesscorrelation.§ ForMRR:nocorrelation.

à Notsuitabletomeasurethefirstrelevantrecommendeditem.

CorrelationsbetweenevaluationresultsonDandD’:

Page 28: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

Layer2: EvaluationResultsConsistency

28

• Ingeneral:statisticallysignificantstrongpositivecorrelatedresults.à Applicabilityinroughlycomparingmethodsbeforeonlineevaluation.• Forspecificmetric:

§ ForNDCG@10:lesscorrelation.§ ForMRR:nocorrelation.

à Notsuitabletomeasurethefirstrelevantrecommendeditem.

CorrelationsbetweenevaluationresultsonDandD’:

Page 29: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

Layer2: EvaluationResultsConsistency

29

• Ingeneral:statisticallysignificantstrongpositivecorrelatedresults.à Applicabilityinroughlycomparingmethodsbeforeonlineevaluation.• Forspecificmetric:

§ ForNDCG@10:lesscorrelation.§ ForMRR:nocorrelation.

à Notsuitabletomeasurethefirstrelevantrecommendeditem.

CorrelationsbetweenevaluationresultsonDandD’:

Page 30: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

Conclusion

Thisstudyisthefirstoneto:• Assesstheapproachbuildinggroundtruthdatabasedonreferencedataforevaluatingrelevantpaperrecommendation.àWeshowedthatthisapproachispromising.

• Proposeaprocesstobuildgroundtruthdatafrombibliographicdata.àWebuiltandpublishedadatasettohelpadvancingotherresearches(*).

30

Page 31: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

Conclusion

Thisstudyisthefirstoneto:• Assesstheapproachbuildinggroundtruthdatabasedonreferencedataforevaluatingrelevantpaperrecommendation.àWeshowedthatthisapproachispromising.

• Proposeaprocesstobuildgroundtruthdatafrombibliographicdata.àWebuiltandpublishedadatasettohelpadvancingotherresearches(*).

31

Page 32: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

Conclusion

Thisstudyisthefirstoneto:• Assesstheapproachbuildinggroundtruthdatabasedonreferencedataforevaluatingrelevantpaperrecommendation.àWeshowedthatthisapproachispromising.

• Proposeaprocesstobuildgroundtruthdatafrombibliographicdata.àWebuiltandpublishedadatasettohelpadvancingotherresearches(*).

32(*)Tobereleasedfromhttps://sites.google.com/site/tranhungnghiep

Page 33: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

Futurework

• Shouldfocusonextensiveassessmentofthisapproach.àEspecially,bycomparisonwithonlineevaluation.

Thankyouverymuch!

Page 34: A Potential Approach to Overcome Data Limitation in Scientific Publication Recommendation

Futurework

• Shouldfocusonextensiveassessmentofthisapproach.àEspecially,bycomparisonwithonlineevaluation.

Thankyouverymuch!

Originalpaper:HungNghiepTran,TinHuynh,Kiem Hoang.APotentialApproachtoOvercomeDataLimitationinScientificPublicationRecommendation.KSE2015.Code&Data: https://github.com/tranhungnghiep/PaperRecommender.Otherresource: https://sites.google.com/site/tranhungnghiep/code-data/paper-recommender-systems.