near real-time recommendations in enterprise social networks

Post on 29-Nov-2014

596 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

- how to compute recommendations using a graph with 40m edges and 11m nodes in 0.2s (200ms) - new perspective on near real-time social recommendations in enterprise social platforms using Linked Data - recommender system that is easy to integrate with social networks and legacy data - application of data analytics in enterprise context

TRANSCRIPT

#AICRECSYS

ADVANsse Advances in social semantic enterprise

HTTP://ADVANSSE.DERI.IE/

MACIEJ DABROWSKI BENJAMIN HEITMANN CONOR HAYES KEITH GRIFFIN

10TH JULY 2013

About me MACIEJ DABROWSKI!

maciej.dabrowski@deri.org!

lecturerAt

co-PI

contact

co-PI

worksWith

researcherAtgraduated

name

Overview

THIS TALK

RESEARCH

INDUSTRY

1.  WHY?

2.  WHAT?

3.  HOW?

4.  TECHNICAL DECISIONS

5.  LESSONS LEARNED

Why? What? How?

technical considerations

lessons learned

Various information domains

preferencesrecommendations

implicitconnections

User profile

TRAVEL

FOOD SPORTS

POLITICS ??

Use Case: Enterprise Social Web

Enterprise social web ENTERPRISE INFORMATION SPACE

MARKETING

DEVELOPMENT

R & D

ANDREW

BOB

CECILIA

DANNY

Limited information flow

MARKETING

DEVELOPMENT

R & D

GREAT TOOL!"

MEETING IBM"

TALK BY DERI"

ANDREW

BOB

CECILIA

DANNY

ENTERPRISE INFORMATION SPACE

Disconnected Social Networks

?  

ANDREW

BOB

CECILIA

DANNY

MARKETING DEVELOPMENT

R & D

Distributed Social Platforms

?  

MARKETING

DEVELOPMENT

R & D

Problem 1: information overload and discovery

Problem 2: data level issues

DISTRIBUTION

MULTIPLE DOMAINS AND TYPES OF ENTITIES

PEOPLE INTERESTS

CONTENT

Requirements - personalization

USE BACKGROUND KNOWLEDGE

ALLOW CROSS-DOMAIN MULTI-SOURCE PERSONALIZATION

EXPLOIT SOCIAL GRAPH

ALLOW REAL-TIME APPLICATIONS

Requirements - data

DATA LEVEL •  FLEXIBLE •  COMPACT •  ENABLE CRUD •  GRAPH?

TRANSPORT PROTOCOL: •  RELIABLE •  EFFICIENT •  PUBSUB?

What?

A PLATFORM BASED ON OPEN STANDARDS THAT IS EASILY PLUGGABLE TO EXISTING INFRASTRUCTURES AND THAT EXPLOITS LEGACY INFORMATION, SOCIAL GRAPH AND INTEREST GRAPH TO PROVIDE A PERSONALIZED INFORMATION “DASHBOARD” IN NEAR REAL-TIME.

use cases

HOW? A look inside

Step 1: Exploit distributed (social) graphs

http://www.insidefacebook.com/wp-content/uploads/2013/06/shutterstock_107108318.jpg

Step 2: Exploit interest graphs

BENEFITS OF USING INTEREST GRAPHS:

1.  FLEXIBLE SOURCE OF BACKGROUND KNOWLEDGE

2.  ANY DATASET CAN BE “PLUGGED-IN” IF NEEDED

3.  CROSS-DOMAIN RECOMMENDATIONS

4.  VERY GOOD IN DISCOVERING INTERESTING RECOMMENDATIONS

OUR APPROACH: SPREADING ACTIVATION

Interest graphs

DERIMaciej

BlogPost2

Maurice

"Emerging Technology"

http://dbpedia.org/resource/Data_analytics

http://dbpedia.org/resource/Emerging_technologies

sioc:creator_of

sioc:topic

worksat

interestrecommended

interest

owl:sameAs

Expanded User Profile (EUP)Includes both original and recommended interests

Social Software Entities

Additional Profile Knowledge

External Background Knowledge

(DBPedia + domain datasets)

Our Approach

A PLATFORM FOR SOCIAL NETWORKS: §  ENTERPRISE FOCUS: PEOPLE, COMMUNITIES, INFORMATION

§  EFFICIENCY USING XMPP PUBSUB AND SPARQL 1.1 UPDATE

§  EXPLOIT INTEREST GRAPH AND VARIOUS DATA SOURCES TO PROVIDE PERSONALIZATION THROUGH SOPHISTICATED NEAR REAL-TIME RECOMMENDATIONS

Demonstrator

EASY TO INTEGRATE WITH CISCO INFRASTRUCTURE

OPEN STANDARDS (XMPP, SPARQL 1.1 UPDATE)

SCALABLE RECOMMENDATIONS BASED ON SOCIAL GRAPH WITH OVER 10M ENTITIES AND 40M EDGES COMPUTED BELOW 1 SECOND (0.2S ON AVERAGE).

MORE DETAILS: HTTP://ADVANSSE.DERI.IE/

demonstrator

Prototype stats

SOCIAL NETWORK GRAPH: •  100S USERS •  100S POSTS •  500+ TAGS •  2000+ ENTITIES •  15000+ EDGES

Saffron.deri.ie

BACKGROUND KNOWLEDGE GRAPH: •  11M ENTITIES •  40M EDGES

CROSS-DOMAIN GRAPH: •  3956 RESEARCH ARTICLES •  LANGUAGE CONFERENCES

Why? What? How?

technical considerations

lessons learned

Technical considerations

ALGORITHM: •  SEMANTIC NETWORK •  LARGE DATASET •  ITERATIVE GRAPH ALGORITHM •  STATEFUL NODES •  EMBEDDING OF DOMAIN LOGIC

Technical considerations

NON-NATIVE IMPORT OF RDF STARTUP TIME WITH DBPEDIA

•  12 MIN ON 24 CORE, 96GB RAM TO LOAD

PARALLEL PROCESSING OF ACTIVATIONS •  STATE FOR EACH USER AT EACH NODE

SCALABILITY ISSUES LACK OF GLOBAL ALGORITHM CONTROL IMMATURE CODE BASE, LACK OF DOCUMENTATION

Technical considerations

NATIVE SUPPORT FOR RDF DBPEDIA (5.46GB) COMPRESSED TO 436MB LOW MEMORY REQUIREMENTS LOW STARTUP TIME (90S) FAST QUERY ACCESS < 1ms

Server design

XMPP SPREADING ACTIVATION HDT

ADVANSSE connectedsocial platform

XMPP client:Ignite Smack

Web application:Tomcat + Servlet

RDF store:Jena Fuseki

ADVANSSEserver

Personalisationcomponent

Recommendationalgorithm

XMPP

R/W RDF store:Jena Fuseki

XMPP

Java API

XMPP server:Ignite OpenFire

XMPP client:Ignite Smack

Fast, R/O RDF store: HDT

SPARQL

SPARQL + Java API

Java API + SPARQL

Java API

SPARQL

Java API

File import

Link resolver RDF store: Jena Fuseki

configuration

•  DISTANCE CONSTRAINT DISABLED •  FANOUT CONSTRAINT ENABLED •  10 TARGET ACTIVATIONS •  ACTIVATION THRESHOLD 0.5 •  INITIAL ACTIVATION 4.0, •  MAXIMUM OUT EDGES 500, •  AND A MAXIMUM OF 10 WAVES AND 1 PHASE

stats

DATASET: •  371 USERS •  6 INTEREST ON AVERAGE •  DEGREE 2-5, UP TO 51

200ms 85% AVERAGE EXECUTION COVERAGE

The value

SOCIAL CAPITAL IN ENTERPRISE SOCIAL NETWORKS IN NOT FULLY EXPLOITED. ENTERPRISE SOCIAL PLATFORMS ARE DISTRIBUTED AND INCLUDE VARIOUS SOURCES OF INFORMATION. VALUABLE INFORMATION IN AN ORGANIZATION IS NOT DISCOVERED BY THE RELEVANT EMPLOYEES.

DISCOVER AND CONNECT WITH RELEVANT PEOPLE IN THE ORGANIZATION. AGGREGATE INFORMATION FROM VARIOUS DISTRIBUTED SOCIAL PLATFORMS USING OPEN STANDARDS PROVIDE NEAR REAL-TIME PERSONALIZATION BASED ON LARGE, DYNAMIC GRAPH DATA.

Why? What? How?

technical considerations

lessons learned

Lessons learned

•  GREATER RELEVANCE TO REAL PROBLEMS •  CLEARER REQUIREMENTS (AND MORE) •  ACCESS TO ACTUAL USAGE DATA (REAL USERS)

•  PATENTS VS. PUBLISHING

•  PROTOTYPE INTEGRATION CONSUMES RESOURCES •  MORE FOCUS ON FEATURE DEVELOPMENT •  LESS EXPLORATION AND HYPOTHESIS TESTING

major considerations

ACCESS TO INDUSTRY DATA

INTEGRATION WITH THE PRODUCT?

https://www.keytrac.net/assets/industry-social-networks.jpg http://www.autointhenews.com/wp-content/uploads/2010/05/volvo-s60-crash-video-image.jpg

Summary

PROBLEM §  INFORMATION OVERLOAD AND INEFFICIENT INFORMATION

DISCOVERY IN DISTRIBUTED ENTERPRISE SOCIAL NETWORKS SOLUTION

§  RECOMMENDER SYSTEM THAT EXPLOITS SOCIAL GRAPH §  UTILIZE INTEREST GRAPH AND LEGACY INFORMATION §  NEAR-REAL TIME PERSONALIZATION

TECHNOLOGY §  OPEN SOURCE COMPONENT FOR RDF DATA AGGREGATION

USING XMPP AND SPARQL 1.1 UPDATE §  PERSONALIZATION COMPONENT BASED ON SPREADING

ACTIVATION APPLICABLE TO MULTI-SOURCE, CROSS DOMAIN DATA

ENORMOUS VALUE

IN

INDUSTRY-ACADEMIA COLLABORATIONS

CONTACT: MACIEJ.DABROWSKI@DERI.ORG

@MACDAB

top related