and scientific resources deliverable d3.4 final federated...
TRANSCRIPT
© EEXCESS consortium: all rights reserved
EEXCESS
Enhancing Europe’s eXchange in Cultural Educational
and Scientific reSources
Deliverable D3.4
Final Federated Recommender Prototype
Identifier: EEXCESS-D3.4-Final-Federated-Recommender-Prototype
Deliverable number: D3.4
Author(s) and company: Hermann Ziak (Know-Center)
Heimo Gursch (Know-Center)
Roman Kern (Know-Center)
Davide Magatti (Mendeley)
© EEXCESS consortium: all rights reserved
Internal reviewers: Christin Seifert (Uni-Passau)
Michael Granitzer (Uni-Passau)
Work package / task: WP3, D3.4
Document status: Final
Confidentiality: Public
Version 2016-05-31
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page iii
History
Version Date Reason of change
1 2016-05-15 First draft of document created
2 2016-05-18 Initial proof reading
3 2016-05-20 Internal Review
4 2016-05-30 Incorporate changes from internal review
5 2016-05-31 Finalised Document
Impressum
Full project title: Enhancing Europe’s eXchange in Cultural Educational and Scientific reSources
Grant Agreement No: 600601
Workpackage Leader: Know-Center
Project Co-ordinator: Silvia Russegger, Jr-DIG
Scientific Project Leader: Michael Granitzer, Uni-Passau
Acknowledgement: The research leading to these results has received funding from the European Union's
Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 600601.
Disclaimer: This document does not represent the opinion of the European Community, and the European
Community is not responsible for any use that might be made of its content.
This document contains material, which is the copyright of certain EEXCESS consortium parties, and may not be
reproduced or copied without permission. All EEXCESS consortium parties have agreed to full publication of
this document. The commercial use of any information contained in this document may require a license from
the proprietor of that information.
Neither the EEXCESS consortium as a whole, nor a certain party of the EEXCESS consortium warrant that the
information contained in this document is capable of use, nor that use of the information is free from risk, and
does not accept any liability for loss or damage suffered by any person using this information.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page v
Table of Contents
1 Executive Summary .................................................................................................................................... 1
2 Introduction ................................................................................................................................................ 3
2.1 Purpose of this Document ........................................................................................................................ 3
2.2 Scope of this Document ............................................................................................................................ 3
2.3 Status of this Document ............................................................................................................................ 3
2.4 Related Documents ................................................................................................................................... 4
2.5 Relation to the Research Roadmap .......................................................................................................... 4
2.6 Structure of this Document....................................................................................................................... 4
3 Federated Recommender ........................................................................................................................... 6
3.1 Final System Architecture ......................................................................................................................... 6
3.2 Modules of the Federated Recommender ................................................................................................ 8
3.2.1 Source Selection ......................................................................................................................................................................... 8
3.2.2 Query Processing ....................................................................................................................................................................... 8
3.2.3 Partner Recommender ............................................................................................................................................................... 9
3.2.4 Result Processing ....................................................................................................................................................................... 9
3.2.5 Result Ranking ........................................................................................................................................................................... 9
3.3 Final Federated Recommender API ........................................................................................................ 10
3.3.1 API Service Calls ....................................................................................................................................................................... 10
3.3.2 Recent Changes in the API Format ........................................................................................................................................... 11
3.3.3 Secure User Profile ................................................................................................................................................................... 12
4 Published Evaluation Results .................................................................................................................... 16
4.1 International Conference on Theory and Practice of Digital Libraries (TPDL 2016) – Accepted as Poster
16
4.2 International Conference on Information and Knowledge Management (CIKM 2016) – Under
Submission ........................................................................................................................................................ 20
4.3 Conference and Labs of the Evaluation Forum (CLEF 2016) – Under Submission .................................. 21
4.4 Social Book Search Lab - Accepted .......................................................................................................... 24
4.5 International Workshop on Text-based Information Retrieval (TIR 2016) – Under Submission ............ 26
4.6 European Conference on Knowledge Management (ECKM 2016) - Accepted ....................................... 27
5 PartnerWizard .......................................................................................................................................... 29
5.1 Query Configuration ............................................................................................................................... 29
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page vi
5.2 Query Generator Testing ........................................................................................................................ 30
5.3 Deployment ............................................................................................................................................ 33
6 System testing and performance evaluation ............................................................................................. 34
6.1 Scenario 1 ................................................................................................................................................ 34
6.2 Scenario 2 ................................................................................................................................................ 35
6.3 Scenario 3 ................................................................................................................................................ 36
6.4 Scenario 4 ................................................................................................................................................ 38
7 Conclusions on the Federated Recommender ........................................................................................... 40
8 Narrative Path .......................................................................................................................................... 41
8.1 Experiment 1: Mining narrative paths from survey papers .................................................................... 41
8.2 Ground truth dataset construction and testing ...................................................................................... 41
8.2.1 Noise filtering .......................................................................................................................................................................... 43
8.2.2 What can be done next on the evaluation dataset? ................................................................................................................ 44
8.3 Experiment 2: Mining Narrative Paths from Mendeley reading logs...................................................... 45
8.3.1 Generating the Markov chain .................................................................................................................................................. 45
8.3.2 Developing the client application ............................................................................................................................................ 46
8.3.3 Limitations ............................................................................................................................................................................... 47
8.3.4 Evaluation................................................................................................................................................................................ 47
8.3.5 How is this different from the previous narrative paths bookmarklet?.................................................................................... 50
8.3.6 Possible improvements & Future work .................................................................................................................................... 51
8.4 Conclusions ............................................................................................................................................. 51
9 References ................................................................................................................................................ 52
10 Glossary .................................................................................................................................................... 53
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 1
1 Executive Summary
This document describes the technical details and evaluation results of the final prototype of the EEXCESS
Federated Recommender. In particular, it gives an overview of the recent changes within the Federated
Recommender implementation, hosted on Github1, and the according ongoing research to go beyond state of
the art within work package 3. This deliverable covers the Federated Recommender and also the work on
Narrative Paths.
The main task of the Federated Recommender is to provide an infrastructure to distribute requests from the
front ends to the partners and return the aggregated results. The initial request is incorporated in the Secure
User Profile with according additional contextual information of the users and their preferences. These
contextual information and preferences can be extracted and provided automatically by the front end or
changed and introduced by the user itself (e.g. spoken language, age, media type preferences). Since every
partner has his own way of processing queries and returning results, the architecture is designed in a manner
where the main tasks to process and distribute the requests happen within the Federated Recommender and
the processing specific for each partner takes place within the so called Partner Recommender.
These Partner Recommenders are individually tailored towards each of the according partners to i) translate
the Secure User Profile to a query the partner is able to process by additionally supporting individual features
(e.g. filter queries), ii) work around individual constraints and finally iii) translate the results to the internal
format of the system with according metadata including enrichment (e.g. partners returning not a list of
documents but a list of document IDs, access restrictions). The Partner Recommenders are responsible to
register themselves with the Federated Recommender component to become part of the EEXCESS system.
Therefore, they also need to provide additional information about their partner system (e.g. location
preferences, appropriate age of the partner’s content target user group). In order to assure that the Federated
Recommender knows about their presence even after restarts or system failure, these registration calls are
conducted periodically. In consequence this architecture provides the ability to distribute the system over
several servers while at the same time enabling the content providers to remain under full control of their
content.
In recent months the Federated Recommender could prove its stability, reliability and universal applicability in
three different deployments. The demo and testing server of KNOW, the development server of JR-DIG for
unified testing of all components, and the stable server of JR-DIG for the release candidates. While KNOW uses
Linux as server operation system, JR-DIG uses Windows. The robustness and versatility of the EEXCESS
architecture was demonstrated by assigning different quality of service levels to the partners. These levels
indicate if a partner is an additional resource in the EEXCESS eco-system or a specialised partner system for
testing purposes. Furthermore, tests showed that the remaining issues regarding the performance of the
system are mainly caused by the response latency of the partner systems.
1 https://github.com/EEXCESS/
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 2
In the following the tasks of the Federated Recommender are summarised:
Parsing and analysing of the received Secure User Profile, e.g. loading of aggregation algorithms,
language detection
Source selection based on several parameters and textual features to restrict the set of partner to just
a well-matching subset, e.g. via the language either provided or extracted from the query, user’s age,
special fields, domain based
Query pre-processing, e.g. grouping of the query, introduction of diversity and serendipity
Partner Recommender management and query distribution, e.g. partner registration, partner access
restriction
Result filtering, e.g. de-duplication, language detection
Result aggregation, e.g. preference dependent aggregation, textual feature based aggregation
The tasks of the Partner Recommender component are:
● Analyse the Secure User Profile as generated by the Federated Recommender
● Selection of the best matching query reformulation strategy for the partner system
● Creating and issuing the call to the partner system, handling all necessary networking operations
● Parsing and processing the response as sent by the partner system, involving the components
developed by WP4 to translate the response and the contained meta-data into the EEXCESS format
● Reporting back the partners result to the Federated Recommender component
The tasks associated with the Narrative Path component are:
● Given a single resource, find matching resources, with together with the original resource represent a
sequence for consumption
● Allow different functions for the sequence building, to enable research on what part of the associated
information can be exploited to form narrative paths
● Conduct studies of the usefulness of narrative path for users
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 3
2 Introduction
2.1 Purpose of this Document
This document is the final deliverable of work package 3 (WP3) of the EEXCESS project. In this document all
generated results and achievements are described. Many aspects mentioned in this document have already
been reported in previous deliverables. To give a comprehensive overview of all WP3 proceedings,
achievements already reported earlier are briefly mentioned.
The EEXCESS framework is an open and extensible Federated Recommender system focused on cultural and
scientific resources from the Web. Although the initial set of content sources is pre-set, additional partner
systems can easily be added to the EEXCESS framework as new content sources. EEXCESS offers a unified entry
point for all connected partner systems to support the automatic recommendation resources to users. The
recommendations are based on the short- and long term user profile as established by the EEXCESS
recommender system. Figure 1 gives an overview about the main components of the final prototype. To
achieve a high degree of scalability, the EEXCESS architecture is designed in a distributed manner. Hence, the
different parts of the EEXCESS system shown in Figure 1 are designed to be distributed across multiple
machines and are connected via the Internet.
Figure 1: A typical configuration of the framework includes a client, a Federated Recommender, and several partner
systems.
2.2 Scope of this Document
The core part of this document is the description of the final prototype developed in the EEXCESS project,
focusing on the so called Federated Recommender. Additionally, parts of the partner system connectors, called
Partner Recommenders, have also been part of WP3, for example query formulation, which uses specialized
partner API features (e.g. detail search call), and the automated registration of a Partner Recommenders at the
Federated Recommender. Other aspects of the partner connections were developed by JR-DIG and reported in
the corresponding deliverable. The PartnerWizard has been developed jointly by KNOW and JR-DIG,
representing the two work packages. With the PartnerWizard potential partners can join the EEXCESS
ecosystem without the need to develop a Partner Recommender on their own, i.e. no programming skills are
necessary. Instead, they are guided through the configuration of a Partner Recommender by a Web GUI.
2.3 Status of this Document
This is the final version of D3.4.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 4
2.4 Related Documents
This document represents the final deliverable of Work Package 3. Some of the covered content is related to
the preceding documents (D3.1, D3.2 and D3.3). Some parts of the PartnerWizard are documented in
deliverable D4.3. The demonstrator of the Federated Recommender is also described in deliverable D7.6.
2.5 Relation to the Research Roadmap
In the preceding deliverables a number of development issues and a research roadmap were presented. The
following steps were taken:
● The integration of user’s interest into the query processing have been implemented and evaluated.
● A block ranking approach has been implemented and made available via a dedicated demonstrator; an
evaluation strategy via crowdsourcing has been designed and internally validated; the actual
evaluation was conducted on a crowdsourcing platform and results are under submission.
● The separate components of the PartnerWizard (joined work of KNOW and JR-DIG within WP3 and
WP4) were merged and finalized. In the course of this work, further modifications of the
recommender were necessary.
● The final prototype of the PartnerWizard was presented to the public on the International Science 2.0
Conference and EEXCESS Final Conference
● In regard to query splitting we developed the planned approach and evaluated its performance. The
results of this evaluation are currently under submission.
● Work on the deeper integration and analysis of the partner systems was continued and methods for
query formulation adopted accordingly.
● In terms of personalisation, a new result aggregation approach was implemented. Making it is possible
to boost results of certain kinds (e.g. media types, nature of the license) further up the top of the
recommendation list. The according parameter can be either learned out of the user’s behaviour by
the front end or defined by the user itself. To support this approach format changes needed to be
undertaken.
● In regard to source selection, apart from the already implemented methods like selection on age,
language or special features, the existing algorithms for domain based source selection were further
refined. Numerous aspects of our source selection approach we thoroughly evaluated. The result of
this evaluation are currently under submission.
● We introduced Wikipedia for Schools domain detection approach for content based source selection,
complementing the WordNet Domains approach
● To support the newly integrated features in the Federated Recommender framework the dedicated
demonstrator of KNOW was updated.
● Introduction of the FedWeb Dataset of the TREC conference into the EEXCESS system for evaluation
purposes.
● The general system performance was evaluated and further improvements on the Federated
Recommender were undertaken.
2.6 Structure of this Document
In the first chapter of the document the developed Federated Recommender, its architecture, and interfaces is
described. The organisation of this section closely follows the papers (conference and workshop contributions)
presented in the subsequent chapter. Next, the PartnerWizard is described, followed by a detailed analysis of
the performance evaluations profiling the Federated Recommender. In the final section the project results are
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 5
recapped. This section elaborates on lessons learned, current limitations and future use and exploitation
strategies. The second chapter summarises the work done on Narrative Paths.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 6
3 Federated Recommender
3.1 Final System Architecture
In this section the complete system architecture and API of the Federated Recommender is presented. The
Federated Recommender consists of a number of individual modules that closely interact with each other.
Figure 2 shows the modules of the Federated Recommender and the information flow between them. The user
context encoded into a Secure User Profile serves as an input for each recommendation process. Therefore, it
is can be interpreted as a query modelling the information need of a (human) user. Each recommendation
process starts with the source selection and the query processing. While the source selection decides on the
ideal content provides for answering a query, the query processing transforms the incoming query into a form
ideal for generation recommendations. The processed query is then send to the subset of sources, based on
the source selection process. This step is conducted by the individual Partner Recommender components. Each
queried source responds with a result list. Due to possible downtime or too much traffic, the Partner
Recommenders must care for the fact that a source might not respond immediately. Therefore, the Partner
Recommenders only waits for a configurable timeout (default: five seconds) for the partner systems to
respond. All non-empty responses arriving within this time frame are then included into the final result list. The
result processing component extracts information from the results lists for further processing. This information
is then exploited to steer the result re-ranking. The result list undergoes a de-duplication process to avoid
situations, where a user is confronted with a list of seemingly identical results. The outcome of this processing
is a personalised result list, which is the answer to the initial query and can be presented to the user. Each of
the mentioned modules will be explained in further detail in dedicated sections. Moreover, also the API to
communicate with the Federated Recommender will be described in the remainder of this chapter.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 7
Figure 2: Architecture of the EEXCESS Federated Recommender. The arrows denote the information flow starting with
the incoming query at the top and ending with the results at the bottom of the diagram. The user context encoded into a
Secure User Profile serves as an input for each recommendation process.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 8
3.2 Modules of the Federated Recommender
3.2.1 Source Selection
Source selection refers to a method to restrict the sources being using within a federated setting to a certain
subset. Different criteria may steer the decision which sources are kept. Depending on the query and the
available sources, one, more or all sources might prove to be eligible. The benefit of selection only a subset of
all available sources is twofold. Firstly, the perceived quality of the returned results may be better since less
non-fitting material is excluded. Secondly, if fewer sources are queried, the more responsive the system will be,
helping to achieve an overall higher usability and acceptance of the system. All source selection strategies rely
on the Secure User Profile and the description of the sources as provided by the Partner Recommenders.
Different aspects of the Secure User Profile are evaluated consecutively resulting in a filtering pipeline applying
the source selection filters one after another. Our implementation also allows the ability to apply filter multiple
times based on different aspects. The different implementation source selection strategies can be grouped into
the following categories:
● Language: Each source might contain content in a different language; some sources might even
provide content for multiple languages. Users cannot be expected to understand all languages present
in the sources. If the preferred language is contained in the Secure User Profile, this information will
be used to select content in the specified language. If the user has not set any preferred language, an
automatic language detection of the query is done. The detected language query is then used to select
content in the same language. The language of the content is part of its meta data. Depending on the
source, this is specified for all documents in the source or for each document individually.
● Age: Content from various partners might be suitable for different age groups. For instance, scientific
papers and reports might not be suitable for children and teenagers. Each partner can describe the
target age group of the provided content. Users who also provide their age as part of the Secure User
Profile will then get recommendations according to matching age groups. Users are not required to
specify their age, disclosing this information remains completely voluntary.
● Category: For each query the topical categories it covers are detected. Similarly, also the categories
provided by a content source are detected. For each query, content sources with the best match in
categories are selected. The categorisation of sources based on domains they cover is done
automatically. A large number of queries are sent to each source. The response of the source is
analysed for each query and based on this analysis, the categories get assigned. This is a complex and
computationally intensive task. Therefore, it is only repeated in adjustable intervals, where there
might be a change in the sources. The same algorithm used to judge the response of the partners is
also used to categorise each query when it is issued. The categories of a query are then matched to
the category of the connected and categorised content partners to decide to which partner the query
is sent.
● Time Range: Some partners allow a filtering based on a creation or modification date of content. A
query can also include a desired time range. For all sources supporting a time-based filtering this
information will be passed on from the query to the source to select the appropriate content.
3.2.2 Query Processing
Partners in the EEXCESS ecosystem can have very different implementations and search algorithms running to
look through their content. Some of these algorithms work well will long queries (i.e. many keywords), some
work well with short queries (i.e. few keywords), and some can work with queries of any length. Depending on
the partner system characteristics and the issued query, the query might need to be extended or split up into
sub queries.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 9
● Query expansion: To expand and diversification queries a pseudo-relevance feedback approach is
employed. The initial query is used to select results from an index filled individual paragraphs from
Wikipedia articles. In the next step keywords are then extracted from the retrieved Wikipedia
paragraphs. These keywords are appended to the original query by a disjunction, i.e. a logical OR.
● Query splitting: Long queries are split up into sub-queries for partners not handling long queries well.
Each keyword is represented as a vector based on the Word2Vec database [Mikolov, 2013]. To split up
the query, the dot product of the vectors is used as similarity measure to re-group the keywords into
shorter queries.
● Query diversification: Additional keywords representing the user’s interest are added to the query as
a conjunction, i.e. a logical AND. The user interests are taken from the interests specified in the Secure
User Profile.
3.2.3 Partner Recommender
The Partner Recommender handles the communication with each partner system. For each partner a dedicated
Partner Recommender exists, representing the partner’s content in the EEXCESS ecosystem. The Partner
Recommender can be seen as a translator between the EEXCESS query and result list formats and the formats
used by the partner system. The effort to develop of a new Partner Recommender, mainly depends on the API
provided by the partner system. For partners providing a HTTP GET call to issue a query and return their results
in the JSON- or XML-format, a new partner recommended can be created by the PartnerWizard without any
the need for dedicated programming. If the partner system offers an API not compliant with the
aforementioned query and response formats, a new Partner Recommender can be created by re-implementing
selected parts of the Partner Recommender reference implementation. This might be needed if a partner
system does not deliver the results in a self-contained result list but a result list only containing result
identifiers. To get the individual results at such an API a separate call for each result is needed.
The Partner Recommenders are designed to be run on any Apache Tomcat server. This allows that they can be
run on the same machine as the Federated Recommender or on other dedicated machines. The decision for
one of these possible setups is usually influenced by the expected number of users and therefore the number
of concurrent user requests the system should be able to process. When one query from a user is processed, all
Partner Recommenders are queried in parallel. Hence, the waiting period for the user is determined by the
slowest response time of all sources and not by the accumulated response times.
3.2.4 Result Processing
The result processing serves as a pre-processing step for the following result ranking and result list
combination. The result processing extracts various features from the result lists. What features will be used
for the result ranking, depends on the result ranking implementation used to answer the query. One of the
features in question is the query-result distance calculated for the result list of every partner source. This
distance is calculated by the number of common keyword in the query and the results divided by the number
of all keyword in the query and the results.
3.2.5 Result Ranking
The result ranking module finally combines all separates result lists into a consolidated result list send to the
user. Different implementations are available for the result recombination. Each of the implementations first
executes a de-duplication of all the entries in the result lists. A fuzzy hash is calculated over the title and the
textual description of each result. If the fuzzy hash of two results is similar, the two results are treated as
identical. After the deduplication, different result aggregation strategies can be selected by settings done in the
Secure User Profile. The simplest aggregation strategy is round robin where one result is taken from each list
one after another. This can also be done in a weighted manner, to favour results from one source over the
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 10
others. Another aggregation strategy operates in the vector space where the query and each result is
represented as a vector. The entries in the vectors represent the keywords and meta data of the query and the
results, respectively. The similarity of the results with the query is calculated by the dot product of the vector
representing the query and the vectors of each result. The final result list contains the individual results
ordered descending by similarity with the query.
Apart from a simple result list format, a blocked result list format is also available. In this case, the result list is
divided into blocks, each one being optimised according to a different optimisation criterion. Three types of
optimisation criteria are available. The first one is on optimisation for diverse results, the second one is an
optimisation for high serendipity in the results and the third option marks the standard setting represented by
the unaltered query. For each criterion an optimised query is send to the sources. This means that the effort to
generate a blocked result list is three times higher compared to a normal result list. The final result list is
divided into blocks. The first block holds results from the un-optimised query, the second block from the query
optimised for diversity and the third block holds results from the serendipity optimised block.
3.3 Final Federated Recommender API
Within this section the final API of the Federated Recommender is outlined. Readers, interested in more
detailed information are invited to read the documentation on GitHub2.
3.3.1 API Service Calls
Within this section we describe the Federated Recommender API calls. These calls provide all the services
available to the frontends or are needed by the Partner Recommenders to communicate with the Federated
Recommender. Detailed information about the individual calls can also be found on GitHub3. All this calls are
accessible via the general recommender service call and use JSON to as exchange format:
http://{SERVER}/eexcess-federated-recommender-web-service-1.0-
SNAPSHOT/recommender
Recommend
The recommend-call accepts a Secure User Profile as input and returns the according result list. The documents
within the list are collected from the registered or selected partners and are aggregated by the default or by
the selected aggregation algorithm.
Get Details
The experience in the last year regarding system performance lead us to the decision to split the recommend
call into two sub calls. The recommend call now only returns the most important fields to the initial queries.
More detailed information about the recommended objects can now be received by sending a list of document
IDs.
Get Partner Favicon
The get parnter favicon call was introduced to give the frontends the possibility to provide visual
information to the user from which partner the result was returned.
2 https://github.com/EEXCESS/eexcess/wiki
3 https://github.com/EEXCESS/eexcess/wiki/Federated-Recommender-Service
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 11
The link to this favicon is send to the Federated Recommender within the partner badge when the partner is
registered. The image is then retrieved by the Federated Recommender and stored within the system. The
frontend has to send the according partner ID as parameter to the Federated Recommender to get the correct
favicon.
Get Preview Image
The get preview image call service a similar purpose then the get partner favicon call.
By request it serves a preview image of the by the frontend provided media type. The idea behind this call was
to have a uniform presentation of the preview images over all frontends.
Get Recommender Stats
The get recommender stats call returns a small information about the current status of the Federated
Recommender. It shows the average time the system took in total to distribute the request and gather the
results for the last ten calls and also the average time the selected result aggregation algorithm needed to
generate the final result list.
Register
The register call is the interface for the Partner Recommender to register itself at the Federated
Recommender. To register itself the Partner has to provide the so called Partner Badge with all the needed
information. This call also functions as heartbeat. Thus in a fixed interval the Partners registration thread sends
his registration information again to assure that the Federated Recommender knows about its status.
Unregister
The unregister call is used by the Partner Recommender when it is shut down to tell the Federated
Recommender that it is not available anymore.
Get Registered Partners
The Registered Partners call has two purposes. First it returns all the partners currently registered in the
system with the according information. Second it also provides information about the status of each partner
including response time, amount of failed request since the initial registration of the partner. Here the system
also distinguishes between timeouts and actual system failures.
3.3.2 Recent Changes in the API Format
Since the protocols between the Federated Recommender and the frontends were already optimized recently
as part of the work undertaken in the second Federated Recommender Prototype. Since then, only slight
adoption had to be performed to support new features.
The extensive description of each field within the Secure User Profile, the Result List, the Details Query and
Details Response is documented on Github4.
In the following the adoption that had to be applied to the Secure User Profile are described.
4 https://github.com/EEXCESS/eexcess/wiki/Request-and-Response-format
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 12
3.3.3 Secure User Profile
One of the key changes regarding the Secure User Profile is the introduction of the user content type
preferences. It is a feature that should further support contextualisation and personalization. Within these
preferences field it is possible to state weightings about certain types of content. This can be utilized either by
the user itself via a context menu within the frontend to state which type of content (e.g. image, text, open
licenced) he would have preferred or it could as well be automatically learned by the frontend itself from the
items the user clicked.
Depending on the underlying aggregation algorithm this information can either be used to boost results that
apply to the given content type or even filter results that do not apply to the preferences.
In the current implementation the defined fields cover pictures, text, videos, items with an unrestricted licence
or expert level items. The latter applies to a field that can be set within the configuration of the partner. This
field is designated for partners that have highly specialised content like Mendeley or ZBW which is only
partially suitable for a broad audience.
The results of this changes can be tested within the dedicated EEXCESS Demo5 website of KNOW and are
demonstrated in the following images.
5 http://eexcess-demo.know-center.tugraz.at/v3/#/system-demo
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 13
Figure 3: Here no preferences of the user are given and the algorithm works by its natural weighting schema based on
the occurrence of the query terms within the textual content but also on the position in the original ranking.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 14
Figure 4: The user indicated or the frontend learned from the user's history that a high amount of pictures within the
final results list is preferable.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 15
Figure 5: The user is an expert within the field and therefore should get recommended items from more specialized
sources like ZBW or Mendeley if available.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 16
4 Published Evaluation Results
Within this section WP3 presents the latest dissemination activities in regard to the Federated Recommender.
The presented work covers the traditional topics of federated retrieval systems, i) resource representation ii)
resource selection and iii) result aggregation, including topics that arise within the real world application of
such systems [Lu, 2005].
Therefore, following research questions were examined:
Can traditional methods of resource representation be improved by preferring ambiguous terms for
probing?
Does the WordNet Domain based resource selection approach yield meaningful results?
To which extend has a partner to be probed so that the knowledge based domain mapping
representation approach yields stable results?
To which degree can small niche sources be integrated within a market share based resource selection
approach without affecting typical evaluation measures?
Which kind of approach suits the purpose of integrating diverse and serendipitous results within an
aggregated result list according to the opinion of users?
Can contextual data of users be helpful in the process of re-ranking by the usage of the documents
metadata?
Can knowledge based methods be applied for topic separation of query terms?
Which are the major factors in regard to response time for federated retrieval systems?
In the following section we present our findings to this questions organized according to our publications.
4.1 International Conference on Theory and Practice of Digital Libraries (TPDL
2016) – Accepted as Poster
Search engines typically keep their data in an index which is continuously updated by the crawler. This crawler
gathers, analyses and finally saves the results within the index.
Out of the resulting statistical information the ranking scheme for potential result documents can be
calculated.
In comparison to this, in a federated search or recommender setting, where the query is forwarded to a
number of attached search engines, the content of the collections is unknown. This also includes the key
statistics like term frequencies which are usually make use of for ranking the documents in algorithms like
Okapi BM25. Literature within the field of federated search refers to this problem as the resource
representation problem in an uncooperative environment. The most prominent proposed solutions to this
problem is based on the idea to approximate the statistics of the dataset by retrieving sample documents by
querying the source, hence the name “query-based-sampling” (QBS) [Callan, 2001]. Here the stopping criterion
is often defined in the number of queries send or number of documents sampled. Although QBS has been
analysed and refined in a wide variety of settings, the question if ambiguous samples help to establish good
coverage of the sources in question is unvisited.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 17
Literature suggests that random words from an English dictionary are not able to create a distinctive enough
representation. Our hypothesis is that ambiguous nouns should require fewer requests to create a
representation which can be exploited later on for source selection.
To conduct this evaluation, the FedWeb Greatest Hits dataset [Demeester, 2015] was used. This dataset is
composed of results of 150 specialized and unspecialized search engines. From these search engines between
20,000 and 80,000 documents were retrieved, giving a total number of over 5 million documents. Our
implemented query selection method used WordNet as dictionary to select four different kinds of potential
queries terms.
Factotum Queries: Within WordNet synsets terms that do not belong to a specific domain are called
factotum, meaning that they do not belong to a specific domain but can appear in almost all domains
therefore they can be considered as highly ambiguous.
Ambiguous Queries: Terms that drawn from WordNet are included at least in three different domains.
Random Queries: For comparison random terms out of WordNet were chosen. This is considered to be
the baseline for the evaluation
Random Queries without retainment: Here the queries were not retained between the consecutive
probing attempts.
Additionally, we wanted to evaluate to which extend query based probing has to be performed to create an
adequate representation of the partner in regard to the Wordnet domain mapping. The approach itself takes
an input and maps it into a set of domains with attached weights. In that regard the input could be either the
query or a document retrieved by the QBS. The final result of this approach is a list of domains covering the
query itself and lists of domains representing the sources which can be matched.
In Table 1 one can see an example of such a mapping of queries and domains:
Input Terms Output Categories
battle of trafalgar { military, history }
women wage gap { economy }
world cup football { sport }
java 8 features { computer science, food }
dinosaur t-rex { animals }
sentimental tears emotions { psychological features }
kittens for sale { animals, commerce }
department of justice { administration }
Table 1
Figure 6 shows the results of a first validation attempt. Here, three sets of twenty documents each were
retrieved. Each set consisted of results belonging to one of three domains, namely mathematics, religion and
health. The used queries were chosen in a way where even non expert users could classify them correctly with
ease. Two of the three domains, mathematics and religion, can take the first two places, the last one, health, is
switched with domain medicine on place three. Although this is not totally correct it should not interfere with
the main goal of the system as long as this behaviour is consistent.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 18
Figure 6 The plot shows the domains that are detected by our implementation in relation to their corresponding weights.
All of the three original domains are recovered except that health and medicine are switched due to their similarity.
In Figure 7 we present the results of the probing with 100 up to 2000 queries. With each iteration the
agreement in the list of domains was measured between the current amount of queries and its predecessor.
The results seem to indicate depending on the partner the about 300 to 700 queries are necessary to get to a
stable state. Further we found that it seems to be possible to distinguish between encyclopaedic partners and
niche-source partners. We define niche-source partners were depending on the textual content of the
documents the domain is easily identifiable whereas encyclopaedic partners cover all kind of topics (e.g. CERN
(niche), WordPress (encyclopaedic)).
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 19
Figure 7: The x-axis represents the number of probed documents, in steps of hundred. The y-axis represents the results
of the “rank biased overlap”-measure, where numbers close to 1 represent a stable distribution of topics. The upper plot
demonstrates an example graph on a niche source called “CERN Documents”, as indicated by a visible convergence. The
lower one shows a graph for an encyclopaedic source in comparison. All approaches yield more stable results on the
niche source. As expected the highest fluctuation is produced by the “random without replacement” approach.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 20
Finally, in Figure 8 we can answer the question if ambiguity yields more stable results in an earlier state.
According to our results it seems that there is only a slight benefit from the factotum and ambiguous approach
in comparison to the random approach, only the random approach without replacement stands out with lower
results in general and bigger deviations from the mean.
Figure 8: Comparison of the 4 different query generation methods with each other. Each diagram shows the mean
performance on niche and encyclopaedic sources with corresponding standard deviation. All 4 methods show a similar
mean performance, except the last method exhibits extreme deviations from the mean.
4.2 International Conference on Information and Knowledge Management
(CIKM 2016) – Under Submission
Although approaches presented by Jin et al. [Jin, 2014] show that a marked share based source selection
approaches yield considerably better results than most of the state of the art approaches, there is one major
drawback to this approach. In the setting of EEXCESS where a mixture of smaller and larger databases can be
expected a marked share based approach could lead to the effect that the algorithm always selects the same
partners regardless of the initial query. Thus small partners covering specialized long tail content, previously
referred as niche sources, might be left behind. Here our source selection approaches could help to resolve
such issues.
On the other hand, according to in the FedWeb Greatest Hits included gold standard dataset, introducing this
niche sources seem to lower measures like precision and NDCG. Hence the ideal solution would be a system
that introduces just as much niche sources into the final result lists that there is only a minor measurable
negative effect. Therefore, we combined two ranking functions which give a weight to each of the sources.
The first ranking function is a reimplementation of the current state of the art while the second one should bias
towards niche sources. The influence of each of the ranking functions is controlled by a parameter alpha,
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 21
ranging between 0 and 1. To gain insights which is the optimum value for alpha were the performance is not
severely affected we conducted an evaluation based on the already described FedWeb Greatest Hits dataset.
Aside from the Wordnet Domains based source selection approach we had a look into the “Wikipedia for
Schools” corpus. Wikipedia for Schools is a subset of the original Wikipedia corpus matching the UK National
Curriculum aiming for educational purposes for children. This corpus has already been used by us to
demonstration the source selection based on age, as this corpus is intended to be consumed by pupils. While
the original Wikipedia category graph has a vastly heterogeneous structure, including cycles, the graph of
Wikipedia for Schools has only about 120 categories. The assignment of categories to the documents or queries
is done by measuring the overlap coefficient of the terms with the documents being assigned to a certain
category. Table 2 shows sample results of domains being assigned to queries from both approaches:
Input Terms WordNet Domains
Categories
Wikipedia for Schools Categories
battle of trafalgar { military, history } { pre 1900 military, military people }
women wage gap { economy } { animal and human rights }
world cup football { sport } { sports teams, religion }
java 8 features { computer science, food } { computer and video games, cartoons }
dinosaur t-rex { animals } { dinosaurs }
sentimental tears emotions { psychological features } { }
kittens for sale { animals, commerce } { mammals }
department of justice { administration } { law }
Table 2
Table 3 shows the results of our evaluation in regard to the optimization introduced niche sources.
nDCG@10 nDCG@20 nDCG@100 P@1 P@5
Baseline (200P) 0.27 0.32 0.49 0.2 0.26
Baseline+LT (α = 0.10) 0.26 0.31 0.47 0.2 0.26
Baseline+LT (α = 0.25) 0.24 0.29 0.46 0.2 0.21
Baseline+LT (α = 0.50) 0.19 0.24 0.42 0.11 0.15
Baseline+LT (α = 0.75) 0.13 0.17 0.39 0.03 0.12
Table 3
4.3 Conference and Labs of the Evaluation Forum (CLEF 2016) – Under
Submission
Within the work submitted to the CLEF 2016 conference we described our approach to conduct a
crowdsourcing based evaluation on our block ranking approach and alternatively an interleaving based
approach to integrate diverse and serendipitous results into the final result list. Even further we tried to assess
the impact factor of our algorithms for diversity and serendipity in result lists.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 22
The actual evaluation was conducted on the crowdsourcing platform CrowdFlower6. Over 300 workers
conducted the evaluation producing a total amount of over 1500 judgments. This work is the follow-up work
submitted last year to the CLEF conference [Ziak, 2015] where we described our design of our dedicated result
list evaluation framework and conducted a small user study of our query reformulation based diversification
approaches. The findings of this work are laying the foundations for the current evaluation.
To gain all the needed information we created a dedicated dataset with the help of query logs of the EEXCESS
system. The finally selected query set contained 52 queries. The evaluation dataset contained the actual user
query, contextual information of the user (e.g. history) and the results created by our blocking algorithms.
Our evaluation setup up contained a total of four scenarios:
In Scenario 1 we wanted to compare our blocking approach against the basic list where no diverse or
serendipitous results were introduced. Here we wanted to gain a basic understanding of the
acceptance level of potential users for such a setup.
Here the workers were instructed to get into the mind-set of a potential user and were given
additional information about the query and the according context.
The worker had to decide which of the two lists suited his information need.
Within Scenario 2 the workers got presented a shortened result list containing either the diverse or
the serendipitous block compared against the also shortened basic list.
The goal of this evaluation was to eliminate the potential possibility that one of the approaches has an
adverse effect on each other since the algorithm to generate serendipitous results was evaluated yet.
Scenario 3 covers the direct comparison of the block ranking and the interleaved approach.
Each task was assessed by six different workers whereas to reduce the potential risk of a bias towards the list
presented first for 50% of the workers the lists are interchanged. Figure 9 shows an example of a task a worker
had to perform.
6 http://www.crowdflower.com/
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 23
Figure 9: Example of a task a worker had to perform. One of the lists shows diversified results at the bottom, the other
one shows the unmodified list. There is no indicator given to the user which contains which content.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 24
We reported the agreement on item level with the arithmetic mean of the percentage of the biggest
agreement. The second reported figures are the percentage of the selected algorithm per approach according
to the preferences of the workers.
Table 4 shows the results according to the first evaluation scenario. The worker's agreement is about 70 for the
interleaved and the blocked approach. We further analysed the queries that got the majority of votes for either
the blocked or the interleaved approach.
Item Agreement Decision Percentage
Interleaved 0.692 0.358
Blocked 0.721 0.355
Table 4
Here only one query was present in both sets which indicates that for most queries either the blocking or the
interleaving approach is beneficial.
Table 5 represents the second evaluation scenario with the goal to assure that the diversity and serendipity
approach do not have adverse effects upon each other. Here all measures produce similar figures for both
approaches. Therefore, we assume that both approaches work on a similar level.
Item Agreement Decision Percentage
Diverse 0.769 0.307
Serendipitous 0.746 0.31
Table 5
The result of the direct comparison of the interleaved and blocking approaches is shown in Table 6. Here both
approaches obtain similar results with a slight tendency towards the block ranking approach.
ItemAgreement DecisionPercentage
Blocked vs Interleaved 0.647 0.532
Table 6
4.4 Social Book Search Lab - Accepted
The Social Book Search Lab on the CLEF 2016 conference consisted of three tracks: suggestion, mining and
interactive track. KNOW took part in the two tracks suggestion and mining. These challenges have been of
interest since both challenges are in the domain of recommender engines. Therefore, the achieved results
could be potentially utilized within the EEXCESS Federated Recommender. Furthermore, similarities can be
found in the usage of data considered to be in the cultural heritage domain. To be more specific, literature with
enriched with matching metadata.
The task of the suggestion track was to recommend books matching the context of users within a forum about
books. The supplied dataset used in both tracks was a book catalogue of 2.7 million crawled records from
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 25
Amazon.com7 enriched with metadata from LibraryThing8. This dataset with all supplied metadata (e.g.
authors, tags, browse nodes, binding) was indexed by using the Lucene framework9. The task specification
supplied data containing a feature rich dataset of the postings from the LibraryThing platform, matching
metadata of the user's reading catalogue, and in some cases examples of mentioned book titles.
In this lab, an approach was presented which relied on performing a similarity search on metadata, tags, and
the initially given books out of the user's catalogue and examples. The initially given books from the user's
catalogue are called browse nodes. They contain categorization information from the Amazon dataset. The
initially created lists of browsed nodes and tags from the catalogue combined with the examples were used to
create two queries. These two queries where later employed to generate lists of similar books. For re-ranking
and merging the resulting lists, a Latent Semantic Indexing (LSI) algorithm was implemented. For the LSI
algorithms, query vectors were created again out of the posting of the user. An example of the results of the
LSI approach can be seen in Table 7. The insights gained in this challenge were applied in a simplified manner in
the new result aggregation algorithm of the EEXCESS Federated Recommender framework.
Book Titles
Catalogue Entries Data Mining: Practical Machine Learning Tools and Techniques
Statistics, Data Analysis, and Decision Modelling
Software Architecture in Practice
Results Introduction to Algorithms
Software Engineering: A Practitioner’s Approach
Artificial Intelligence: A Modern Approach
Artificial Intelligence (Handbook Of Perception And Cognition)
Machine Learning (Mcgraw-Hill International Edit)
Prolog Programming for Artificial Intelligence
An Introduction to Support Vector Machines and Other Kernel-based Learning
Methods
Table 7: The table shows the inputs and the generated recommendations. The book titles in the "Catalogue Entries" row
are used as inputs; the row "Results" hold the generated recommendations.
The mining track was divided into two tasks, the linking and the classification task. Within the linking task the
goal was to identify entities within sentences. This means, that books titles mentioned in the user’s posting
should be recognised. The objective of the classification task was the identification of a user’s request for a
book recommendation, which can be interpreted as an implicit request for suggestions.
For the linking task, a dataset of LibraryThing postings with the associated ground truth data was provided by
the organizers. A gazetteer like system based on the book titles out of the previously mentioned Amazon index
was implemented to fulfil the task. Furthermore, experiments were conducted to refine the resulting book title
candidate lists by removing false positives, hence improving precision (e.g. sentences classification or
7 https://www.amazon.com/
8 https://www.librarything.com/
9 https://lucene.apache.org/core/
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 26
identification containing books, author co-occurrence). Although this final post processing steps could not be
applied on the finally submitted runs, our results still are still within the best of this track.
For the classification task about 2000 threads from LibraryThing and also 250 threads from Reddit10 were given.
In particular, content from the Subreddits “suggestmeabook” and “books” was included. For this task, our
approach relied on typical features from the area of Natural Language Processing (NLP). These features are n-
grams, number of terms within the test and also uncommon features like the average spelling error within the
text or associated tags and browse nodes from the Amazon dataset. Three different classification algorithms
were trained with this feature set, a Random Forest classifier, a Naive Bayes classifier, and a Decision Tree.
The presented approach managed to achieve the third place for the LibraryThing testing set with an accuracy of
91 percent. We were beaten by two baseline runs from the organisers achieving first and second place. For the
Reddit based dataset we managed to come second by achieving an equal accuracy of 82 percent with the top
team on the first place. These two tracks resulted in two publications accepted at the SBS Lab of Clef 201611.
4.5 International Workshop on Text-based Information Retrieval (TIR 2016) –
Under Submission
One of the main challenges within the field of context-driven query extraction is the relevant context
identification. Here the goal is to identify the actual information need of the user, the topics the user is
interested in. Since this task of identifying the main topic of interest of the user is not easy to solve the
resulting query might contain several unrelated topics. Literature even suggests that it can be beneficial to
cover several of these topics at once within a recommender related context [Rhodes, 2000]. While such a
procedure will certainly work in many cases, there is a downside to it in a federated setting. Different sources
have the tendency to respond to certain queries inconsistently.
While one source might yield good results with multi-topic queries another one might not return results at all.
For that reason, it might beneficial to provide a possibility to topically partition queries. Most approaches for
query splitting appear to rely on two sources of information. Either via the usage of query logs or the initial
probing of the query within the source. Both approaches are not easy to apply federated setting. Query logs of
federated recommendation systems are difficult to obtain and additionally might be biased by the algorithm
creating the query. The probing approach bears the problem of introducing additional latency which is already
a challenge to cope with in a distributed setting.
Instead of making use of directly accessible information one can resort to use external knowledge sources. An
example of approaches relying on knowledge bases is Word2Vec which gained a lot of attention recently. The
work submitted to the TIR2016 conference contains the evaluation of two approaches to topically separate
queries by clustering the query terms of unrelated queries with the help of the well-known K-Means algorithm
and the Google News’ Word2Vec model. This approach was compared to a very simple approach where the
according query is just split into N groups of equal length. Within the evaluation N ranges from two to four.
As query dataset the well-studied Webis-QseC-10 dataset was used consisting of 5000 user queries. Although
this dataset does not ideally match the setting of auto generated queries it should still be sufficient to show the
general validity of the approach. As measures Rand Index and V-Measure were used.
10 https://www.reddit.com/
11 http://social-book-search.humanities.uva.nl/#/mining16
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 27
In the first evaluation setup N unrelated queries of the dataset are joined. Here we assume that the queries
send to the system are in a topical sequence. For example, such a setting could be the result from a context-
driven query extraction approach based on paragraphs where two paragraphs are falsely unified. Here the
vectors created from Word2vec extended by the positional information of the term.
The results of this evaluation are presented in Table 8. In this setting the most important information seems to
be the position of the query terms. Although this might be based on the fact that the largest share of user
queries tends to be in the range of three to five terms.
Two Queries Three Queries Four Queries
Word2Vec Kmeans Rand Index 0.71 0.64 0.59
Word2Vec Kmeans V Measure 0.77 0.77 0.76
Split Approach Rand Index 0.71 0.66 0.63
Split Approach V Measure 0.77 0.78 0.78
Table 8
Within the second evaluation setup we assume a situation where the query terms are not in topical related
order. This might happen in a system that extracts keywords out of several paragraphs and returns a list of
queries weighted by the importance of the topic. Therefore, the already joined queries were randomized.
The results of this setup are shown in Table 9. As one would have expected the simple split approach yields
results that can be interpreted as totally random behaviour whereas the Word2vec based approach seems to
work better on a quantifiable level.
Two Queries Three Queries Four Queries
Word2Vec K-Means Rand Index 0.088 0.071 0.056
Word2Vec K-Means V-Measure 0.373 0.341 0.373
Split Approach Rand Index 0.008 0.003 0
Split Approach V-Measure 0.278 0.267 0.236
Table 9
4.6 European Conference on Knowledge Management (ECKM 2016) -
Accepted
A publication analysing the overall responsiveness of the EEXCESS systems was submitted and accepted by the
Web 2.0 Models, Methods and Tools in Knowledge Management minitrack on the 17th European Conference
on Knowledge Management (ECKM 2016). The paper is due to be published in the third quarter of 2016. The
paper is titled “Context-Driven Federated Recommendations for Knowledge Workers” and highlights the
capabilities of EEXCESS supporting knowledge workers by automatically providing suggestions for useful
material. For environments where it is not possible to integrate EEXCESS into the applications the knowledge
workers interact with, the usage of the generic Web interface is suggested. In such a scenario, EEXCESS cannot
automatically suggest material, but serves as a unified access point for all connected partner systems.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 28
Even though some of the EEXCESS functionality cannot be used in when used only the Web-Interface, it is still
beneficial for knowledge worker compared to using all partners individually. The paper goes on to describes
the general architecture of the EEXCESS system and the Federated Recommender core in particular. In three
test scenarios the parallel processing capabilities of the EXCESS systems are evaluated. While on of the test
scenarios reconstructs a system as it is likely to be deployed in a real world scenario, the other two are corner
cases achieving the best and worst system performance. The evaluation shows how many requests can be
processed in parallel on a specified processing hardware. To judge the service quality, the recommender's
response times and the number of failed queries are used. A query as counted as a fail, if the partner system
does not respond within ten seconds.
The evaluation shows that a machine with four CPU cores and four gigabytes of main memory the EEXCESS
Federated Recommender can process up to 100 requests in parallel with at ten different partner sources
connected via the internet. The overall response time in the test was dominated by the response times of the
response times of the partner systems. This highlights the efficiency of the request transformation, result
transformation and aggregation, and re-ranking implemented in the Federated Recommender.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 29
5 PartnerWizard
The PartnerWizard is a software tool to create new Partner Recommender without requiring any programming.
New partners not having programming resources are thus enabled to join up the EEXCESS ecosystem. The idea
of the PartnerWizard was first formalised in the half-year management report February-July 2015 for work
package 3. The design and development of the PartnerWizard was already presented in the Deliverables D9.5
and D3.3. KNOW and JR-DIG both have been working on different parts of the PartnerWizard. Figure 10 depicts
the individual parts of the PartnerWizard. Figure 10 also shows the task and the required order in which they
need to be completed when a new Partner Recommender is generated. All interaction with the PartnerWizard
is done via a Web-based graphical user interface (Web GUI). KNOW had been working on the query
configuration and query generator testing (depicted in green). JR-DIG had been working on the parts providing
the initial configuration, the result list mapping, and the final deployment of the generated Partner
Recommender. In this deliverable only the parts developed by KNOW are described in detail; the parts
developed by JR-DIG are reported in the deliverable D4.3. Many aspects have already been reported in
deliverable D3.3 and are updated and consolidated for this final deliverable.
Figure 10: The PartnerWizard guides users through the necessary configuration steps. Parts depicted in blue have been
developed by JR-DIG; parts depicted in green by KNOW. In this deliverable only the parts developed by KNOW are
reported.
5.1 Query Configuration
Example queries are needed for the test of the query generation. Therefore, a list of queries covering large
areas of different knowledge domains are pre-loaded by the PartnerWizard to do the query generator testing.
It is possible to modify this list of queries just before the query generator testing is started. Keywords can be
added or deleted from a query, new queries can be added, existing queries can be deleted, and the main topic
of a query can be set. All of this might be desired due to special material covered by the partner, which might
otherwise not be correctly reflected in the test. Each query can consist of one or more keywords. A keyword is
not only restricted to one term, but can have more one or multiple terms. Similarly, each query can have a
main topic, meaning that one keyword describes the central concept of the complete multi-keyword query.
These query options are available for the entire EEXCESS system, not only for the PartnerWizard. Figure 11
shows the Web GUI to configure the queries.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 30
Figure 11: Web GUI to configure the queries used to test the query generation.
5.2 Query Generator Testing
After the queries are selected the main test run can start. In this test run different query generator
configurations compete against each other. A query generator configuration consists of a query generator
implementation with the settings for query splitting and query expansion. For each query generator either
query splitting, or query expansion, or neither of them can be enabled. Hence these three settings have to be
tested for each query generator implementation, meaning that for each query generator implementation three
different configurations need to be tested. Each of these configurations is than tested with each of the queries
configured earlier. In total there are
numberOfConfigurations = 3 ∙ numberOfQueryGeneratorImplementations ∙ numberOfQueries
configurations which need to be tested.
The testing procedure is depicted in Figure 12. At the beginning of the tests all query generator
implementations are tested if they produce a not empty result set for any of the queries. Only if this is the case,
they are used further. In the next step all possible configurations compete against each other. If two
configurations produce result lists which are not equal, the user has to decide which one fits the query best.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 31
The Web GUI showing the two result lists to the user is depicted in Figure 13. When all configurations have
been tested, the configuration which was chosen most times is stored as the winning configuration. If there is a
draw, the simplest configuration is used. To judge the simplicity of a configuration, the list of query generator
classes internally stored in the PartnerWizard is ordered by ascending complexity of the implementation.
Furthermore, no query modification is considered simpler as query splitting, and query splitting is considered
less complex than query expansion. With this set of rules, always a winning configuration can be found.
Figure 12: Activity Diagram showing all necessary steps to select the query generator configuration. This diagram is a
modified version of Figure 3 from deliverable D3.3.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 32
Figure 13: Web GUI showing two non-identical result lists to the user. The user has to decide which result list matches
the query best. The query is shown above the two result lists, where the main topic of the query is set in a bold typeface.
All interactions between the user, the PartnerWizard and the partner systems are depicted in Figure 14. The
user has to trigger the process via the Web GUI by completing the query configuration. Then the PartnerWizard
automatically test all query generator implementations known to it. All query generators implementations
returning a non-empty result list for at least one query, are potential candidates for the desired configuration.
For each of these candidates all possible configurations are iterated. Next, two configurations compete against
each other. If they provide the same result list, the PartnerWizard automatically counts this as a draw. If the
result lists are not equal, the user has to vote which result list fits the query best. After all pairs have been
voted on, the winning configuration is determined and stored by the PartnerWizard. The final Partner
Recommender can then be deployed with the winning configuration.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 33
Figure 14: Sequence Diagram depicting the interactions done during the query generator testing between the user, the
PartnerWizard and the partner system. This diagram is a modified version of Figure 4 from deliverable D3.3.
5.3 Deployment
After the query generator testing, all parameters to configure and generate a new Partner Recommender are
complete. To create the new Partner Recommender, the initial configuration, the result list configuration, and
the query generator testing are combined. With all this information a Java Web Archive holding the configured
Partner Recommender is created. The Partner Recommender is a Java Servlet than can be run on any Apache
Tomcat 8.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 34
6 System testing and performance evaluation
A series of tests was carried out to objectively judge the performance of the Federated Recommender
framework. For all tests two different system environments where used. Test environment #1 consists of a
Linux virtual machine with four CPU cores and four gigabyte of main memory; tests environment #2 consists of
a Linux virtual machine with eight CPU cores and eight gigabyte of memory. The test covered up to ten of the
available partner recommends and the Federated Recommender with its query transformation, result
aggregation and re-ranking.
In four scenarios the overall system performance was evaluated. All scenarios were tested with the same sets
of 1,649 queries. One third of the queries originated from the EEXCESS query logs, while the other two thirds
were selected from the AOL query dataset [Pass, 2006]. The set of queries was split into subsets containing 10,
30, 50, 100, 150, and 500 queries. Each query in a subset was sent to the by a dedicated thread to simulate
parallel requests send to the Federated Recommender. A ten seconds intermission was made between two
subsets of queries, to let the partner systems recover to a state of normal operation. This implicated that, the
bigger the subsets of queries are, the faster the test run is done since less subsets and therefore also less
intermissions are needed.
Three measurements are used to judge the performance, namely the number of query send in parallel, the
average response time, and the number of failed queries. A query is counted as a fail, if the Partner
Recommender does not provide a result list within five seconds. All scenarios are considered in vivo tests and
the partners are connected via the Internet. Hence, variations between the individual runs due to network
latencies and load from requests by other users are expected. The internet uplink speed for all test systems
offered a bandwidth of 200 Mbps.
6.1 Scenario 1
In this scenario the Federated Recommender and the Partner Recommenders are run on the same virtual
machine. The virtual machine is equipped with four CPU cores and four gigabyte of main memory. To test the
setup with different system loads, two runs are conducted. In the first run three partners (Europeana,
KIMPortal, and Mendeley) are queried and in the second run ten (Deutsche Digitale Bibliothek, Deutsche
Nationalbibliothek, Deutsche Zentralbibliothek für Wirtschaftswissenschaften, Digital Public Library of America,
Europeana, KIMPortal, Mendeley, RijksMuseum, Swissbib, Wissen Media). All partners were connected via the
internet.
In Table 10 and Figure 15 the results for the test runs in scenario 1 are shown. As expected, the total response
time decreases with increasing batch size. This is due to the fact, that with increasing batch size less batches
are needed to send all of the 1,649 queries. Hence, less interval pauses between the batches are needed and
the total time consumed decreases. This shows that the response times are heavily dominated by the intervals
between the batches. When comparing the run with three and ten sources only in the case of 500 parallel
requests a performance deterioration can be identified for ten sources. This counts towards the argument, that
100 parallel requests can be process for ten partners with the Federated Recommender and the Partner
Recommenders running on the same machine.
The situation is different when looking at the average number of timeouts, as they show a clear increase
between the runs with three and ten sources even for the smallest batch. This supports the claim, that the
Partner Recommenders are not capable of handling all requests within the required window of five seconds. To
investigate this aspect further, scenario 2 was designed.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 35
Total response time including
intermission in seconds
Average number of timeouts
Parallel Requests Three Sources Ten Sources Three Sources Ten Sources
10 1,641 1,643 6 24
30 543 547 3 338
50 324 328 0 827
100 164 168 268 1,225
150 111 112 700 1,463
500 107 131 1,406 1,521
Table 10: Accumulated response times and average number of timeouts for three and ten knowledge sources connected
over the Internet.
Figure 15: Accumulated response times and average number of timeouts for three and ten knowledge sources connected
over the Internet.
6.2 Scenario 2
In this test setting the run with ten partners (Deutsche Digitale Bibliothek, Deutsche Nationalbibliothek,
Deutsche Zentralbibliothek für Wirtschaftswissenschaften, Digital Public Library of America, Europeana,
KIMPortal, Mendeley, RijksMuseum, Swissbib, Wissen Media) from scenario 1 connected via the Internet was
repeated. To judge the influence of the available computing resources on the average number of timeouts, in
this scenario a virtual machine with eight CPU cores and eight gigabytes of memory was used. Table 11 and
Figure 16 shows the results from this run and for comparison also the results of scenario 1 with ten sources.
When looking at the total response time, only in the case of 500 parallel requests an improvement can be seen.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 36
At the average number of timeouts, the improvement is biggest for small batches of parallel requests and
declines with increasing batch size.
Total response time including
intermission in seconds
Average number of timeouts
Parallel Requests 4 CPUs,
4 GB RAM
8 CPUs,
8 GB RAM
4 CPUs,
4 GB RAM
8 CPUs,
8 GB RAM
10 1,643 1,643 24 1
30 547 546 338 32
50 328 327 827 179
100 168 167 1,225 790
150 112 111 1,463 1,098
500 131 86 1,521 1,473
Table 11: Accumulated response times and average number of timeouts for three knowledge sources connected over the
Internet with the EEXCESS system run on two different hardware configurations.
Figure 16: Accumulated response times and average number of timeouts for three knowledge sources connected over
the Internet with the EEXCESS system run on two different hardware configurations.
6.3 Scenario 3
The setup used in this scenario consists of two virtual machines. Both have four CPU cores and four gigabyte of
main memory. One of the virtual machines hosts the Federated Recommender and the other one hosts three
Partner Recommenders. Europeana, KIMPortal and Mendeley are used as partners and are connected via the
internet. Table 12 and Figure 17 shows the results of the tests run. The results for the single machine run are
taken from scenario 1 with the same three partners. For the setup with two separate machines running the
Federated Recommender and Partner Recommenders, the maximum accumulated CPU load never exceeded
20%. Concerning the total response time, an improvement is noticeable for 500 queries issued in parallel. This
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 37
can be interpreted that the Partner Recommenders generate a considerable load which is too big for one
machine with four CPU cores to handle. The average number of timeouts does not vary considerable between
both setups. This aspect is investigated further in scenario 4.
Total response time including
intermission in seconds
Average number of timeouts
Parallel Requests Single machine Separated
Machines
Single machine Separated
Machines
10 1,641 1,642 6 0
30 543 543 3 0
50 324 324 0 0
100 164 164 268 218
150 111 109 700 747
500 107 79 1,406 1,359
Table 12: Accumulated response times and average number of timeouts for Partner Recommenders and federates
recommend running on together one single and separated two machines.
Figure 17: Accumulated response times and average number of timeouts for Partner Recommenders and federates
recommend running on together one single and separated two machines.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 38
6.4 Scenario 4
In this scenario only one locally hosted Partner Recommender is used. This scenario was designed to be a
benchmark for the preceding runs. Since the Partner Recommender is hosted locally, no network delays or
requests from other parties from the Internet can be encountered.
The FedWeb Greatest Hits [Demeester, 2015] collection web dump indexed by Apache Solr has been used as
the locally hosted Partner Recommender. The FedWeb dataset consists of search results from 150 different
search engines. From each search engine 20,000 to 80,000 documents are contained in the dataset. In total,
the dataset contains over five million documents which were indexed by a local Apache Solr instance. To query
the Apache Solr index, a purpose build Partner Recommender was implemented, directly accessing the index
on the hard disk.
In Table 13 and Figure 18 the results are shown again compared with the results from scenario 1 with three
partners connected via the internet. All measurements shown in Table 13 and Figure 18 were taken on a single
virtual machine with four CPU cores and four gigabytes of main memory. Looking at total response times, only
for 500 parallel requests a difference is notable. For the average number of timeouts, the situation is more
radical as the locally hosted do not result in request timeouts for any number of parallel request. As the
Partner Recommender is running on the same machine in both cases, this gives raise to the conclusion that the
partners connected over the internet are not able to respond in time for large number of more than 50
concurrent queries. Which might either be the limitation of their systems or a simple quota (either IP or API
based).
Total response time including
intermission in seconds
Average number of timeouts
Parallel Requests Internet Local Internet Local
10 1,641 1,640 6 0
30 543 540 3 0
50 324 321 0 0
100 164 161 268 0
150 111 104 700 0
500 107 35 1,406 0
Table 13: Accumulated response times and average number of timeouts for sources connected via the Internet and a
locally hosted Apache Solr search engine.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 39
Figure 18: Accumulated response times and average number of timeouts for sources connected via the Internet and a
locally hosted Apache Solr search engine.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 40
7 Conclusions on the Federated Recommender
The work of performance optimization and feature engineering within the Second Federated Recommender
Prototype has been continued in the last iteration of the project. Furthermore, efforts have been made to
scientifically evaluate and refine the already proposed or introduced methods. Finally, evaluations on the
experimental studies on the system performance and according potential drawbacks of such federated systems
have been undertaken.
The joined efforts of WP4 and WP3 regarding exploitation and dissemination, the PartnerWizard, could already
prove its benefits. Several new partners were introduced into the system without further development efforts
needed. Based on the experience of the last years this was an important step since the potential lack of
resources of arising partners became obvious to be the greatest obstacle for further uptake. In addition, the
PartnerWizard was presented to the general public on the International Science 2.0 Conference and EEXCESS
Final Conference were subsequent new potential data providers indicated their interest to join the EEXCESS
system.
The exploitation of the already introduced partner sources was driven further by refining the query formulation
process based on the feedback of internal partners but also external users of the framework. Apart from
finalisation and refinement of the Federated Recommender Prototype, the Partner Recommender framework
and the PartnerWizard, KNOW focused on the scientific exploitation of newly introduced and core algorithms
of the system resulting in a series of publications. Topically these publications cover most of the main
challenges of federated systems, source selection, source representation, result aggregation and query
formulation but also challenges arising from the real world application of such systems.
Some of the results presented here could already be integrated as new features in the Federated
Recommender Framework. Here the most prominent example is the new result aggregation algorithm. It
promotes personalization and contextualisation of the results lists in several ways. With the accompanying
changes of the Secure User Profile it is now feasible that i) the user can state its preferences by himself and
therefore interfere with the ranking of the recommended documents on its own ii) the frontend adopt to the
user's behaviour by automatically learning the preferences. Although this approach already received positive
feedback within the consortium we want to extend this approach by learning to rank algorithms.
Although one of the biggest challenges within the real world application of such a federated system, the
adoption and connection of new partner systems, could be resolved, one challenge is still present: the
challenge of achieving low latencies within such a system. From our performance centred evaluation we found
that the main problem here is the response time of the partners. Although we could improve the response
time of the whole system substantially by splitting the requests into the recommendation and into the details
request the response time of the partners is still the biggest influence factor to the rest of the system. The only
potential solution to this problem is the hosting of the Partner Recommenders directly at the partner's side
which is supported by the system architecture.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 41
8 Narrative Path
Deliverable 3.3 described a prototype for narrative paths exploiting citations in scientific literature in order to
create a set of paths linking the resources. The algorithm has been evaluated against a gold dataset, and the
items ordering has been defined based on reader count. The approach has been evaluated surveying a sample
of Mendeley’ users and showed positive results.
In Deliverable 3.4 the research has been focused on whether it is possible to identify consistent pattern of
citation between papers in order to identify paths as information seeking journeys [Fernando et al. 2013].
In order to test this hypothesis two distinct experiments have been carried out. The objective in both cases was
the induction of a directed graph describing a path between articles that can help the researchers to navigate a
collection of literature using the idea of paths as sequences of items.
In the first experiment a corpus of literature has been mined to extract in-line citations and their relative
position. If the order in which they appear is consistent enough across the corpus is then possible to induce a
directed acyclic graph (DAG) where nodes represent articles and edges the relationship “read after.”
The second experiments instead exploited Mendeley logs extracting the users' reading sequence. The crowd-
sourced solutions aim to identify frequent reading patterns that can induce a Markov Chain in which states are
articles and transitions reflect the probability of reading a given paper after the current one.
8.1 Experiment 1: Mining narrative paths from survey papers
In this experiment, we asked the question of whether there are any consistent patterns in a body of research
papers on the citation order, i.e. is the citation to paper A located after a citation to paper B consistent?
In order to verify our assumption, a ground truth dataset is needed. Such ground truth should contain an
absolute ordering for a set of papers in the literature defining the order in which they appear.
As first step we wanted to evaluate if the assumptions on the ordering were valid, by assessing the level of
agreement across different raters, where each rater is represented by a document from which the relationship
pairs are extracted. High agreement would confirm the assumption while a low agreement would mean that no
evidence of agreement is present.
8.2 Ground truth dataset construction and testing
We assume that “state of the art papers” (i.e. papers containing a survey of the literature for a specific
research area) are the best candidates to induce an ordering over the cited documents from fundamental
paper to more advanced topics in the research area.
The dataset has been built by issuing queries to the Mendeley catalogue with specific query terms. The process
resulted in a corpus composed by 159,254 articles containing in their title the following terms:
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 42
Term in the title Documents retrieved
survey 61,221
state of the art 5,559
introduction to 11,831
overview of 14,081
review of 66,625
Table 14
The resulting dataset is composed by tuples of type: (<uuid>,<filehash>)
For each document in the dataset, identified by <filehash>, we retrieved the associated PDF document and
extracted the citation contexts and resolved it against the Mendeley API. The citations have been ordered
according to their first occurrence in the source document.
The enriched dataset is then composed by tuples of type (<filehash>,<uuid>,<order>)
Due to some limitations on the amount of calls we can issue against the Mendeley API, sampled over 1M
instances of citation occurrences. We considered this amount sufficient for initial experiments.
The dataset has been pre-processed and we extracted all the existing follower relations, i.e. for any given
research paper, we compiled all pairs (uuid1, uuid2) where the first occurrence of uuid1 precedes the first
occurrence of uuid2. Each tuple in the pre-processed dataset will have the following schema
(<filehash>,<uuid1>,<ord1>,<uuid2>,<ord2>).
In order to test the agreement on the order of pairs in observed in the data, we coded the data as a matrix
where each row corresponds to one rater (<filehash> - i.e. the source of information) and each column to
each pair combination <uuid1>,<uuid2>. A cell of the matrix contains:
● 1, if in a given <filehash> the first occurrence of uuid1 precedes the first occurrence of uuid2
● 0, if in a given <filehash> the first occurrence of uuid2 precedes the first occurrence of uuid1
● NA, if in a given <filehash>, both uuid1 and uuid2 were not observed.
The agreement has been computed by means of the Krippendorf’s alpha [Krippendorff ,2013] and Fleiss’ kappa
[Fleiss, 1973] as alternative agreement metric.
We computed the metrics to a sample of the data because the implementation of Krippendorf’s alpha and
Fleiss’ kappa were not able to scale to the size of the data.
The values of alpha and kappa we were able to obtain on the sample data sample are as follows:
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 43
Krippendorff's alpha
Subjects 190045
Raters 78
alpha -0.293
Table 15
Fleiss' Kappa for m Raters
Subjects 95695
Raters 78
Kappa -0.0125
z -218
p-value 0
Table 16
Both results show no agreement.
8.2.1 Noise filtering
We have tried to filter out noise from the agreement matrix in the following way:
● we have aggregated all the rows (rates) using a sum function
● we dropped all the (uuid1, uuid2) pairs where the abs(𝑥) ≤ 5. This means that we only kept pairs
where the difference between the number of raters who preferred (uuid1, uuid2) over (uuid2, uuid1)
was at least 5.
This resulted in a dataset of 28,611 pairs down from 27,820,109.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 44
Figure 19: Pair strength in the filtered dataset.
The dataset has the following distribution where the pairs strength is the margin by which voters agreed on the
citations order.
After the filtering, we can observe the following agreement: Krippendorff's alpha nominal metric: 0.139
This indicates that a very low agreement can be observed. However, it is still questionable, whether this
observed agreement is not only the result of the selection. This experiment was conducted on a dataset of
513,173 article pair judgements.
8.2.2 What can be done next on the evaluation dataset?
1. Given the fact that no significant agreement can be observed, we might want to try binning citations
originating from the same sentence/paragraph/section. This would mean that the order within that
unit is irrelevant and pairs should not be generated between these items. This would decrease
dimensionality and maybe is worth trying whether some agreement can then be seen.
2. The scalability problem is not necessarily the problem of the agreement algorithms scalability but of
the size of the input matrix, which has a very large number of columns. As the matrix is very sparse
too, one option would be to look for an implementation which uses a sparse matrix representation.
3. Approach the NP problem in a different way, for example decide that a path is composed of influential
papers impacting the currently visited paper. An approach similar to Valenzula et al. [Valenzula, 2015]
or Zhu et al. [Zhu, 2015] can then be followed.
4. Think of another approach for obtaining an evaluation data set.
5. Create a narrative path using Mendeley usage data following the idea of mining what readers usually
read after reading a given paper.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 45
8.3 Experiment 2: Mining Narrative Paths from Mendeley reading logs
The aim of this experiment was to develop an approach for browsing using the idea of narrative paths
generated from Mendeley usage data. More specifically, our assumption is that we can model the problem as a
Markov chain where the states correspond to academic articles and the transition probabilities representing
the relationship “read after” are mined from usage data. The transition probability of moving from article A to
article B is given by the probability of a user opening article B within 10 minutes of opening article A in any of
the Mendeley clients. Mendeley clients include the Mendeley Desktop, Mendeley Web Library and the Mobile
application.
Our approach can be fundamentally divided into two steps:
1. Generating the Markov chain, i.e. a directed weighted graph where the probability of moving to the
next state depends only on the current state.
2. Developing a client that allows a user to browse the network.
8.3.1 Generating the Markov chain
Our approach for generating the Markov chain is based on the following steps:
Input
>12 billion live events from the Mendeley logs (720 GB) from which transition probabilities are to be
determined
>622 million user articles (70 GB) corresponding to >132 million unique articles representing the states
of the Markov chain
Steps
1. Filter events related to opening a PDF using one of the Mendeley clients, namely:
‘OpenPdfInInternalViewer’, ‘OpenPDFInExternalViewer’ and ‘OpenFileInExternalViewer’.
2. Join filtered events with the catalogue (resulted in 27 million events, 32.6 GB)
3. Filter out document events for which there is no DOI or for which Mendeley does not have a full text.
4. Generate edges <doc_id1, doc_id2> for documents where doc_id2 was opened within 10
minutes of opening doc_id1.
5. Calculate the transition probabilities for the Markov chain getting triples of the form <doc_id1,
doc_id2, p> as 𝑝 =#opened(𝑑𝑜𝑐_𝑖𝑑1,𝑑𝑜𝑐_𝑖𝑑2)
#opened(𝑑𝑜𝑐_𝑖𝑑1,𝑋).
6. Group the triples by doc_id1 to collect outgoing edges and transition probabilities for a given node.
Group the triples by doc_id2 to collect incoming edges and transition probabilities for a given node.
Output
~10 million states and >61 million transition edges.
The original data where obtained by unloading the live_events and catalogue35 tables stored on the
Mendeley Redshift database. The processing then included the use of several Pig scripts, which were executed
on the Mendeley cluster. To select the pairs of articles that were open within 10 minutes of each other, we
have extended the functionality of Pig by streaming the data through a Python script.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 46
An example sample of the Markov chain is displayed in Figure 20 below.
Figure 20: A sample from the Markov chain induced from the Mendeley log data showing states as academic articles with
transitional probabilities between them.
8.3.2 Developing the client application
The functionality of the client application is as follows:
Preprocessing:
Import the network into Elasticsearch (or a database/key-value store) so that we can quickly retrieve
all the outgoing and incoming edges and transition probabilities for a given node.
Use case:
1. Retrieve the DOI of a document that the user wants to generate a path for/from by activating a
bookmarklet on this document. For simplicity, this has been implemented to work within the
Mendeley catalog web interface12, but the work could be fairly easily extended to work on any
research paper available on the Web.
2. Query an Elasticsearch instance to retrieve the identifiers of incoming and outgoing edges and
transition probabilities for a given state. Rank the retrieved states/edges descendingly by their
transition probability. Select identifiers of top N incoming and outgoing edges.
3. Resolve the retrieved identifiers (DOIs) against the Mendeley API to get the article metadata.
4. Display the retrieved documents to the user presenting which articles people tend to open after
reading this document and which articles they read before reading this document.
12 https://www.mendeley.com/catalog/
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 47
The client application was developed using the bookmarklet technology we described in the previous
deliverables. To install the bookmarklet, all the user has to do is to drag and drop a link to the bookmarklet into
the user’s bookmarks in Chrome. The user can then visit any article page with a DOI in the Mendeley web
catalog and activate the bookmarklet on it. The result will look similar to the one in the figure below showing
what people tend to read next/before reading this document.
Figure 21: The bookmarklet for what people read next/before reading this document.
8.3.3 Limitations
The decision to model the problem as a Markov chain is a simplification. In theory, the transition
probabilities should depend on all the states the user has visited before. We leave the way to model
this and to determine any discounting factors to future work.
We have observed that in some cases, the same article can be recommended for both read after as
well as read before. While these loops in the network are supported by the usage data, it might be a
good idea to remove them as they might suggest that these articles can be read in any order.
8.3.4 Evaluation
To evaluate the quality of the proposed recommendation, we performed a user study over the Mendeley user
base. We selected and mailed a sample of Mendeley Core Users - i.e. users with high activity on Mendeley- that
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 48
already have used any bookmarklet and visited Mendeley Suggest13. We asked to the user to install the
bookmarklet and try it on some sample documents or to freely navigate Mendeley Catalogue and answer a
short survey 14.
The survey has been answered by 70 Mendeley Core Users, i.e. user that habitually use Mendeley in their
workflow, selected in all academic positions except Bachelor Students.
How easy was the bookmarklet to install and to use?
Very easy: everything worked fine 54.40%
Somewhat easy: it worked in most cases but there were a couple of small problems 17.60%
Not very easy: it took a few goes to get it to work or it only worked in specific cases 10.30%
I was unable to get it to work at all 17.60%
Table 17
How useful did you find the reading lists generated?
Very useful: the reading lists were relevant 33.80%
Somewhat useful: the reading lists were useful in some cases, but not others 42.60%
Not very useful: the reading lists were largely irrelevant 23.50%
Table 18
How useful is the list of papers that people read BEFORE the current document?
Very useful: the suggested papers are relevant to better understand the current paper. 29.40%
Somewhat useful: some of the suggested paper are relevant others not 44.10%
Not very useful: the suggested papers are not relevant at all 26.50%
Table 19
13 https://www.mendeley.com/suggest/
14 https://www.surveymonkey.com/r/FJ5MYXW
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 49
How useful is the list of papers that people read AFTER the current document?
Very useful: the suggested papers are relevant and helpful to discover new papers. 30.90%
Somewhat useful: some of the suggested paper are relevant others not. 48.50%
Not very useful: the suggested papers are not relevant at all. 20.60%
Table 20
Overall, how likely would you be to use this tool in your research workflow?
Very likely: I’ll use this regularly 42.60%
Somewhat likely: I may use this occasionally 36.80%
Unlikely: I won’t use this at all or only very rarely 20.60%
Table 21
Figure 22 Do you have any other comments or specific suggestions?
Result indicate the quality of the bookmarklet technology as an aid for exploring the path, the narrative paths
itself is judged generally useful, but sometimes the path does not correspond to the user idea of a sequence.
The ordering in before/after seems to agree with the user perception.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 50
The survey also highlights the need to refine the construction of the Markov chain, tuning the threshold for the
transition probability and the identification of loops between before and after.
The users appreciate the narrative paths as an element of their research workflow and advocate the
integration of narrative paths in Mendeley suggest.
8.3.5 How is this different from the previous narrative paths bookmarklet?
There are a number of differences from the bookmarklet we reported in the previous deliverables. Perhaps the
main difference lies in the use of a different idea to extract the narrative path. While the previous bookmarklet
used a content-based method to extract citation references from the article full text, this method relies on
usage activity logs. This has a number of consequences:
The method is able to provide recommendations on what to read next/before based on the
information of what people usually do read next/before rather than what the author would suggest
them to read next/before.
The usage based method is not applicable to documents on which we don’t track, or don’t have yet
the usage activity. These include new documents added to Mendeley for the first time and reports
which are not research articles, such as government reports, blog posts, Wikipedia pages, etc.
The usage based method has the potential to add serendipity into the recommendations.
The usage based method faces the challenge of relying on an uneven amount of user activity on each
document. This means that it can be expected that the quality of the produced recommendations is
good for documents with many readers, while being poor for documents with fewer readers. On the
other hand, the content based method is likely to produce pretty much the same quality of
recommendations across documents.
The content based method provides little information allowing us to decide on the direction of
whether a user should read something next/before. More specifically a reference
Overall, both of these approaches have pros and cons. To move the prototype solution into practice, we would
suggest to combine these solutions in the following way:
1. Extract citation references using the content based method
2. Retrieve links people follow using the usage data based method described in this section.
3. If we don’t have usage data, display the results from the content based method.
4. Otherwise, see if there is any overlap between the recommendations provided by the content based
and usage based method. If there is an overlap, it means that an article that is explicitly referenced by
a document is also often read by the readers, i.e. people follow this citation. This means that this
reference is important/useful for understanding this document as shown in Figure 23 below.
Consequently, this activity should be scored higher.
5. If there is no overlap, use the usage based recommendation if we have sufficient activity on the
document, otherwise fall back to the content based solution.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 51
Figure 23: An example transition from the generated Markov chain which coincides with an extracted citation from the
full text of the document that cites. We believe that such cases are indicative of the importance of a citation.
8.3.6 Possible improvements & Future work
The idea of narrative paths browsing is at the moment realised with the bookmarklet technology allowing the
user to see only one step ahead and one step back. However, this is more the limitation of the UI rather than
the approach. An obvious improvement here is to change the UI to support the displaying of paths based on
the Markov model. Establishing the length of the path and its end is an open question that could be explored.
We currently believe that a practical approach would be that by activating the bookmarklet on a given
resource, this resource would be seen as the goal resource and a shortest path over the graph would be
computed connecting it with any of the resources the user already has in their library.
8.4 Conclusions
We have presented two experiments addressing the automatic generation of a network of resources that is
needed to enable us to solve the Narrative Paths problem. Our ideas followed extracting/finding directed
relations connecting resources based on the principle of “Read B after A.” In Experiment 1 we investigated a
content based approach relying on citation positioning. In Experiment 2, we investigated an approach relying
on user activity data. Our results suggest that the user data can be exploited to induce a Markov chain over the
reading sequence to suggest to the user a reading path. Other systems can utilise this model to provide
recommendation/browsing capabilities across collections using the idea of narrative paths. We suggest to
combine this approach with the content based approach we described in previous deliverables, which naturally
complements this idea.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 52
9 References
[Callan, 2001] Callan, J., & Connell, M. (2001). Query-based sampling of text databases. ACM
Transactions on Information Systems (TOIS), 19(2), 97-130.
[Demeester, 2015] Demeester, T., Trieschnigg, D., Zhou, K., Nguyen, D., & Hiemstra D. (2015) FedWeb
Greatest Hits: Presenting the New Test Collection for Federated Web Search. In 24th
International World Wide Web Conference (WWW 2015).
[Fernando, 2013] Fernando, S., Goodale, P., Clough, P., Stevenson, M., Hall, M., & Agirre, E. (2013).
Generating Paths through Cultural Heritage Collections. LaTeCH 2013.
[Fleiss, 1973] Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass
correlation coefficient as measures of reliability. Educational and psychological
measurement.
[Mikolov, 2013] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word
representations in vector space. arXiv preprint arXiv:1301.3781.
[Lu, 2005] Lu, J., & Callan, J. (2005). Federated search of text-based digital libraries in hierarchical
peer-to-peer networks. In Advances in Information Retrieval (pp. 52-66). Springer Berlin
Heidelberg.
[Jin, 2014] Jin, S., & Lan, M. (2014). Simple May Be Best-A Simple and Effective Method for Federated
Web Search via Search Engine Impact Factor Estimation. In TREC.
[Krippendorff, 2013] Krippendorff, K. (2013). Component of Content Analysis. Content Analysis: An Introduction
to its Methodology. 3rd Edition. Los Angeles: SAGE Publication.
[Pass, 2006] Pass, G.; Chowdhury, A. & Torgeson, C. (2006), A Picture of Search, in "Proceedings of the
1st International Conference on Scalable Information Systems", ACM, New York, USA.
[Rhodes, 2000] Rhodes, B. J. (2000). Just-in-time information retrieval (Doctoral dissertation,
Massachusetts Institute of Technology).
[Valenzuela, 2015] Valenzuela, M., Ha, V., & Etzioni, O. (2015). Identifying Meaningful Citations. AAAI
Workshops.
[Zhu, 2015] Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2015). Measuring academic influence: Not all
citations are equal. Journal of the Association for Information Science and Technology,
66(2), 408-427.
[Ziak, 2015] Ziak, H., & Kern, R. (2015). Evaluation of Pseudo Relevance Feedback Techniques for Cross
Vertical Aggregated Search. In Experimental IR Meets Multilinguality, Multimodality, and
Interaction (pp. 91-102). Springer International Publishing.
D3.4
Final Federated Recommender Prototype
© EEXCESS consortium: all rights reserved page 53
10 Glossary
Terms used within the EEXCESS project.
Partner Acronyms
JR-DIG JOANNEUM RESEARCH Forschungsgesellschaft mbH, AT
Uni Passau University of Passau, GE
Know Know-Center - Kompetenzzentrum für Wissenschaftsbasierte Anwendungen und Systeme
Forschungs- und Entwicklungs Center GmbH, AT
INSA Institut National des Sciences Appliquées (INSA) de Lyon, FR
ZBW German National Library of Economics, GE
BITM BitMedia, AT
KBL-AMBL Kanton Basel Land, CH
CT Collection Trust, UK
MEN Mendeley Ltd., UK
WM wissenmedia, GE
Abbreviations
API Application Programming Interface
EC European Commission
EEXCESS Enhancing Europe’s eXchange in Cultural Educational and Scientific resource
HTTP Hyper Text Transfer Protocol
JSON JavaScript Object Notation
LSI Latent Semantic Indexing
NLP Natural Language Processing
XML Extensible Markup Language
Acknowledgement: The research leading to these results has received funding from the European Union's
Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 600601.