a potential approach to overcome data limitation in scientific publication recommendation

APotentialApproachtoOvercomeDataLimitationinScientificPublicationRecommendation

HungNghiepTran,TinHuynh,KiemHoang

UniversityofInformationTechnology

Vietnam

Originalpaper:HungNghiepTran,TinHuynh,KiemHoang.APotentialApproachtoOvercomeDataLimitationinScientificPublicationRecommendation.KSE2015.Resource: Seethelastslide(SlideShareconvention.)

IntroductionRelevantpaperrecommendation:Whatpapersarerelevanttoaresearcher’sinterests.

§ Note:Differentfromcitationrecommendation.

2Imagesource:SugiyamaandKan,Exploitingpotentialcitationpapers inscholarlypaperrecommendation,JCDL‘13.

àProblemsinexperimentsandevaluations.

Introduction

Offlineevaluation:Themostpopularapproach.§ Basedongroundtruthdata.• Recommendedpapersarecomparedtotheonesknowntoberelevanttoeachresearcher.

àBuildinggroundtruthdataisusuallydifficultandexpensive.

5

Introduction

Offlineevaluation:Themostpopularapproach.§ Basedongroundtruthdata.• Recommendedpapersarecomparedtotheonesknowntoberelevanttoeachresearcher.

àBuildinggroundtruthdataisusuallydifficultandexpensive.

6

Introduction

• Anapproachtobuildgroundtruth:§ Naturalthinking,referencesarerelevanttoresearchers’interests.

àIntuitively,wecanbuiltgroundtruthdatabasedonreferecedata.

àButthisapproachisnotexploredintheliterature.

7

Introduction




8

Introduction




9

OurGoal

To systematically study the approach thatbuilds ground truth data based onreference data for evaluation of relevantpaper recommendation.

10

RelatedWork• Relevantpaperrecommendationisanemergedresearcharea:§ [Sugiyama&Kan,JCDL‘10].§ [Leetal.,ICCCI‘14].§ [Ohtaetal.,ICADIWT‘11].

• Lackofgroundtruthdataisarecognizedproblem:§ [Beeletal,RepSysWorkshop‘13].

• Someapproacheshavebeentriedtobuilddata:§ Manuallybuilt:

• [Sugiyama&Kan,JCDL‘10].§ Adaptedfromreferencemanagementsoftware:

• Mendeley.• Docear.

11

OurApproach

• Theoreticalanalysis:§ Proposeandanalyzethehypothesessupportingtheapproach.

• Empiricalanalysis:§ Evaluatetheapproach’scapabilityofevaluatingrecommendationmethods.

12

TheoreticalAnalysis

Hypotheses:• Hypothesis1.Inthecontextofdoingresearchandwriting

scientificpublications,therearemanylevelsofexposingaresearcher’sinformationneedsinwhichcitingisthehighestlevel.

• Hypothesis2.Referencesmadebyresearchersaretheirrelevantpublications.Moreover,theyarethemostimportantones.

• Hypothesis3.Futurereferencescouldbeusedasgroundtruthdata inevaluationofrecommendingrelevantpublicationsforresearchers.

àTheapproachisreasonable.13

TheoreticalAnalysis

Hypotheses:• Hypothesis1.Inthecontextofdoingresearchandwriting

scientificpublications,therearemanylevelsofexposingaresearcher’sinformationneedsinwhichcitingisthehighestlevel.

• Hypothesis2.Referencesmadebyresearchersaretheirrelevantpublications.Moreover,theyarethemostimportantones.

• Hypothesis3.Futurereferencescouldbeusedasgroundtruthdata inevaluationofrecommendingrelevantpublicationsforresearchers.

àTheapproachisreasonable.14

EmpiricalAnalysis

15

Howto evaluate theapproach’scapabilityofevaluating recommendationmethods?à Thetrick:Two-layerevaluation.

1. Evaluatedifferentrecommendationmethods,gettherecommendationevaluationresults.

2. Evaluatetheconsistencyoftheaboveevaluationresultsontwodatasets,theonebuiltbasedontheapproachandtheonebuiltmanually.

EmpiricalAnalysis

16




EmpiricalAnalysis

17




EmpiricalAnalysis

18




ExperimentsPlan

19

1. Build a dataset D with ground truthdata based on the approach.

2. Get a manually built dataset D’ forcomparison.

3. Recommend by different methods onD and D’.

4. Evaluate recommendation methods’result on D and D’.

5. Evaluate the consistency of evaluationresults on D and D’.

ExperimentsPlan


1. Build a dataset D with ground truthdata based on the approach.

2. Get a manually built dataset D’ forcomparison.

3. Recommend by different methods onD and D’.

4. Evaluate recommendation methods’result on D and D’.

5. Evaluate the consistency of evaluationresults on D and D’.

DifferentContent-basedFiltering recommendationmethods formedbyfeaturevectorcombination.

Evaluation

Layer1:Toevaluaterecommendationresults:§ Order-aware:NDCG@5,NDCG@10.§ Firstrelevantitem-aware:MRR.

Layer2:Tomeasuretheconsistencyofevaluationresults:

§ Pearson’scoefficient.§ Spearman’scoefficient.§ Kendall’scoefficient.

21

Evaluation

Layer1:Toevaluaterecommendationresults:§ Order-aware:NDCG@5,NDCG@10.§ Firstrelevantitem-aware:MRR.

Layer2:Tomeasuretheconsistencyofevaluationresults:

§ Pearson’scoefficient.§ Spearman’scoefficient.§ Kendall’scoefficient.

22

TheProcesstoBuildDataset

• Timeline

• TargetResearcher: thoseonesforwhomrecommendationsaregenerated.

• FutureReference:thosepaperscitedintheFuturebutnotinthePast.

• GroundTruthData:futurereferencescitedbyeachtargetresearcher.

23

FuturePastPresent(Defined)

Firstpublished year Lastpublished year

TheProcesstoBuildDataset

• Timeline

• TargetResearcher: thoseonesforwhomrecommendationsaregenerated.

• FutureReference:thosepaperscitedintheFuturebutnotinthePast.

• GroundTruthData:futurereferencescitedbyeachtargetresearcher.

24

FuturePastPresent(Defined)

Firstpublished year Lastpublished year

ExperimentalData

• D:Automaticallybuiltdataset.§ Tobereleasedfromhttps://sites.google.com/site/tranhungnghiep

• D’:Manuallybuiltdataset.§ FromSugiyamaandKan,JCDL‘10.

25

26

Layer1: RecommendationEvaluationResults

EvaluationresultofCBFmethodson2datasetsDandD’:

Layer2: EvaluationResultsConsistency

27

• Ingeneral:statisticallysignificantstrongpositivecorrelatedresults.à Applicabilityinroughlycomparingmethodsbeforeonlineevaluation.• Forspecificmetric:

§ ForNDCG@10:lesscorrelation.§ ForMRR:nocorrelation.

à Notsuitabletomeasurethefirstrelevantrecommendeditem.

CorrelationsbetweenevaluationresultsonDandD’:


28






29





Conclusion

Thisstudyisthefirstoneto:• Assesstheapproachbuildinggroundtruthdatabasedonreferencedataforevaluatingrelevantpaperrecommendation.àWeshowedthatthisapproachispromising.

• Proposeaprocesstobuildgroundtruthdatafrombibliographicdata.àWebuiltandpublishedadatasettohelpadvancingotherresearches(*).

30

Conclusion



31

Conclusion



32(*)Tobereleasedfromhttps://sites.google.com/site/tranhungnghiep

Futurework

• Shouldfocusonextensiveassessmentofthisapproach.àEspecially,bycomparisonwithonlineevaluation.

Thankyouverymuch!

Futurework

• Shouldfocusonextensiveassessmentofthisapproach.àEspecially,bycomparisonwithonlineevaluation.

Thankyouverymuch!

Originalpaper:HungNghiepTran,TinHuynh,Kiem Hoang.APotentialApproachtoOvercomeDataLimitationinScientificPublicationRecommendation.KSE2015.Code&Data: https://github.com/tranhungnghiep/PaperRecommender.Otherresource: https://sites.google.com/site/tranhungnghiep/code-data/paper-recommender-systems.

a potential approach to overcome data limitation in scientific publication recommendation

Science