ad 1656 - transforming social data into business insight
TRANSCRIPT
Transforming Social Data into Business InsightsMarie Wallace, Vincent Burckhardt
Copyright © 2015 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written
permission from IBM.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of
the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS
DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY
DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF
PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they
are provided.
Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how
those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating
environments may vary.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in
all countries in which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All
materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any
individual participant or their specific situation.
It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification
and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to
comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance
with any law
Notices and Disclaimers
2
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources.
IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related
to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the
quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL
WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE.
The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or
other intellectual property right.
•IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management
System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®,
Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®,
pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®,
Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®,
X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and
service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark
information" at: www.ibm.com/legal/copytrade.shtml.
Notices and Disclaimers cont.
3
IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s solediscretion.
Information regarding potential future products is intended to outline our general product direction and it should not be reliedon in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.
The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s solediscretion. Information regarding potential future products is intended to outline our general product direction and it shouldnot be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
Please Note:
4
You are custodian of the most valuable data within the
enterprise IF you can release it for business value
Are you an Analytics Rockstar?
5
6
Organizations with a highly engaged workforce significantly outperform those without
The shift to digital now makes analysis of engagement networks possible
Organizations with a highly engaged workforce significantly outperform those without
The shift to digital now makes analysis of engagement networks possible
7
Can we use analytics to better understand employee
engagement and it’s impact on the business?
Capture & Understand your Enterprise Network
8
Management Employee
Capture & Understand your Enterprise Network
9
Management Employee
ODPi (Open Data Platform Initiative, odpi.org)
10
ODPi is an industry
effort to promote and
advance the state of
Apache Hadoop and
Big Data technology
for the enterprise. It
currently has 24
member companies.
IBM is a founding
member of ODPi and is
one of 4 members to
release a data platform
based on the ODP core;
IBM Open Platform.
Priorities
Certifications for ODPicompatible distributions
Guidelines for ODPiISVs and consumers
Introduce more big data projects into ODPi
Data
Exchange
Data Scientist & Developer Platform Services
Analytic Services
Data Processing & Management
IBM Open Platform (ibm.biz/ibmopenplatform)
11
IBM Engagement Analytics (ibm.com/engage)
12
Data
Exchange
Data Scientist & Developer Platform Services
Analytic Services
Data Processing & Management
Capture & Understand your Enterprise Network
13
Management Employee
Helps each employee better understand their engagement, reputation, and
helps them more effectively activate their network for maximum value
The Personal Social Dashboard
14
Activity: Measure of your activity
Reaction: Measure of how people
respond to your activity
Eminence: Measure of how
people respond to you
Network: Measure of the quality of
your network and your role within it
Helps management better understand overall engagement and
organizational health, identify issues and action accordingly
– Shows connectivity within & between teams
– Identifies people who play key roles
– Highlights organizational brittleness
The Organizational Dashboard
15
Organizational Health
16
Many analysis actionable w/ recommendations
17
Understandyour engagement & reputation within the
social network
Acton your personal
recommendations to drive improvement
Employee Matching: Based on a person’s social activity define if, and to what level, they fit a specific social engagement trait
Template Instantiation: Generate recommendations that if followed can change and strengthen their engagement patterns
Based on Recommendation Templates & Network Analysis:
Innovation & Advocacy
18
#1 Collaboration Does Impact Business Outcome
• Engaged employees are 120% more likely to generate Innovation and 150% more likely to demonstrate Customer Advocacy
#2 Optimal Behavior is Different for Everyone
• A variety of interactions most effectively contribute to business outcome
#3 Discovering & Disseminating Optimal Behaviors is Key to Improving Business Outcome
• The Personal Social Dashboard provides such a channel
Employee Retention
19
Does engagement change prior to an attrition event?
Analyzed organizational, social, and
retention data
Inspected 10,000 random employees as a
control group and 1188 employees who quit
Yes! And engagement analytics can help to predict attrition events
Social Behavior Patterns: less engaged with differences in types of activity
Volume of Activity: less activity several months prior to attrition event
Network: Attrition is viral (common manager, passive and active network
Capture & Understand your Enterprise Network
20
Management Employee
Transforming discrete data into insights
http://techproductmanagement.com/wp-content/uploads/2014/03/BigData.jpg
Big Data Analytics
22
Business
Insights
Analyticsdata
data
data
datadata
datadata datadata
data
data
data datadata
data
data
data
data
data
data
data
data
data
datadata data
datadatadata
data
data
data
Analytics
Our scope: making sense of the data
23
Extracting meaningful data from your social platform
Home pageSee what's happening across your social network
CommunitiesWork with people who share common roles and expertise
FilesPost, share, and discover documents,presentations, images, and more
Micro-bloggingReach out for help your social network
ProfilesFind the people you need
WikisCreate web content together
ActivitiesOrganize your work and tap your professional network
BookmarksSave, share, and discover bookmarks
BlogsPresent your own ideas, and learn from others
ForumsExchange ideas with, and benefit from the expertise of others
IBM Connections
25
IBM Connections provides APIs and SPIs that allow the
value of the social data to be maximized by external
systems:
ALL Connections data can be accessed by external systems
Open, transparent, breaking down silos
Pull data from IBM Connections
Programmatically access much of the same information that you can through the IBM Connections user interface
Have Connections push data to you
All data changes (CUD) event in all IBM Connections components can be supplied to external consumers
Connections Maximizes The Value of Social Data
26
Directory
JMX / WSAdminAdministration
Search
Person Card
User Directory
IBM Connections Apps
RDB
Common Services
NavigationalHeader File
System
Connections Architecture
27
HTML
Directory
JMX / WSAdminAdministration
Search
Person Card
User Directory
HTTP Server & Proxy Cache
POST
JavaScript Atom FeedAtom Entry
PUT DELETE GET
HTML Form
IBM Connections Apps
RDB
Common Services
REST API
Feed Reader
Sametime Portlets Your AppLotus NotesBrowser Mashups
JSON
Microsoft Office
NavigationalHeader
Connections
Atom API
FileSystem
Connections Architecture
28
HTML
Directory
JMX / WSAdminAdministration
Search
Person Card
User Directory
HTTP Server & Proxy Cache
POST
JavaScript Atom FeedAtom Entry
PUT DELETE GET
HTML Form
IBM Connections Apps
RDB
Common Services
Other Enterprise Services
REST API
Feed Reader
Sametime Portlets Your AppLotus NotesBrowser Mashups
JSON
Microsoft Office
NavigationalHeader
Connections
Atom API
Integration busEvent SPI
Your App
FileSystem
Connections Architecture
29
Designed to allow 3rd party to get notified whenever a
data change happens in any of the IBM Connections
service
Real-time events generated by IBM Connections include all create, update, and delete (CUD) operations
Potential to represent the complete interaction footprint of the enterprise
Allowing to capture, persist, model, analyze, visualize and monetize your enterprise network
SPI (System Programming Interface) vs API (Application
Programming Interface)
SPI at lower level than APIs ... contribute Java code at system level
By contributing Java code written to this SPI, 3rd parties can listen to creation, deletion and update (and more!) events of content within IBM Connections
The Event SPI is the social data fire-hose
30
Events: collections of data generated when activities (data-
modifying, notifications) occur in IBM Connections
In the SPI, an event is represented by a Java bean / object
A Event encapsulate data such as the type of action and the object (and container) involved in the action
Events are delivered to Event Handlers:
An event handler is a Java class implemented by a 3rd party (you!)
Event handlers are registered in an XML file (event-config.xml)
Instructing what type of event to send to a given handler
Connections delivers Java bean representing the event to registered event handler(s)
Event SPI
Handler 1
Handler 2
Handler N
Event-config.xml
Event SPI – Programming aspects
31
The Event SPI relies on event handlers written in Java to
allow vendors to listen and process events generated by
the system
Running external code (untrusted) on Cloud is not possible
Running 3rd party code on same WebSphere servers as our applications is not safe
Multitenancy issues
Introducting Switchbox
Our plan is to allow customers/vendors to listen events generated for their own organization on our Cloud applications without running code on our system
Already leveraged by compliance solutions
Currently being implemented for broader consumption, not available as of now
Cloud considerations
32
Reliable delivery mechanism
Delivery at least once, support and recover from network failure
Latency tolerant
Ease of transition between on-premise and
Cloud
Java event handlers implemented for Event SPI can be run by Switchbox client
Main difference being that the event handlers are deployed and run on customer infrastructure, outside IBM Connections datacenter
SwitchBox client invokes event handlers upon reception of event
Base for generation of events from most IBM
social apps (Sametime)
Event SPI
SwitchBoxclient
Handler 1
Handler 2
SwitchBoxserver
Switchboxhandler
Customer infrastructure
Switchbox is not currently available. This diagram
shows our desire to provide such a solution to allow
customer consume events from their own
organization on Cloud
IBM Connections Cloud infrastructure
Cloud considerations
33
blog.entry.created:
“Amy Jones posted a blog entry in the blog named XYZ”
The person who
initiated this action.
Details: External id, name
and, if not disabled, email
address
Type Item ContainerActor
Type of action
Example:
CREATE,
UPDATE,
DELETE,
NOTIFY,
MEMBERSHIP, ..
General concept for
representing an
individual entity within
a container
Details: id, name, textual
content, HTML and
ATOM paths
General concept for
representing a "bucket"
or "container" that
contains other items
Details: id, name
Event SPI – available data in each event
34
Many more data fields encapsulated in events:
Correlation item set to represent parent-child relationship (events about commenting action)
Target set, allowing to deduce interaction between content and people
Membership delta field, indicating who has been added/removed from a community, activity, ...
... see Event SPI documentation for full list (JavaDoc)
Key point: the event model encapsulates all of data needed to understand the interaction between people,
content and containers in the platform
Event SPI – available data in each event
35
Challenges of analytics:
Large amount of incoming event stream
Over 100+ events per second CUD
Growing on longer term
Scalable framework for analysis
Horizontal scale to address growth
(Near) real-time indexing
No data loss
Event SPI in the context of an analytic solution
36
Analysis, even basic, is time consuming, thus:
Analysis should not occur in the event handler,
but in an external system (“Analytics Service”)
The event handler should not wait until the
analytic service processes the event
It would result in an accumulation of events at Connections level
Problematic as Connections queue retaining events to be delivered to event handler has a limited depth
=> Design event handler to consume and
process events as fast as possible, ie: as the
interface between IBM Connections and an
external system
“Data backbone”Storage for asynchronous processing
Event SPI
Analytics Service
Event Handler
Goal: retaining as many
events as possible for
further analysis
Taming the fire-hose... (1/2)
37
Characteristics of the data backbone
Distributed and highly available
Horizontal scale
High throughput
Agnostic to consumers' state
Multiple options
Message brokerMQ / MQTT / ActiveMQ / Apache Kafka
Database
...
Taming the fire-hose... (2/2)
38
Send JSON
representation of the
event. Serialization to
JSON through Open
Source GSON library
Java class implementing
the EventHandler interface
Integration with a message broker – Apache Kafka
39
Registration – through events-config.xml
Java class implementing
EventHandler interface
Subscriptions define the
events delivered by the
SPI to the event handler.
Filtered by event name,
source (IBM service),
or/and type (CREATE,
UPDATE, DELETE, ...)
Properties: name/value
pair injected in the event
handler java class.
Typically used to pass
config. settings
Integration with a message broker – Apache Kafka
40
Deployment – jar and dependencies made available to the SPI (running in the IBM Connections News application) through a Shared Library in WebSphere Application Server
Integration with a message broker – Apache Kafka
41
Good news:
Events surface in most case all data needed for analytics purposes (including the content the event is
about)
Events about the same object repeat data
If there are X events about the same object, the item/correlation data set will always contain the most up-to-date information about the referenced object
For an analytic solution – in a nutshell, this means that the Event SPI should be sufficient in most
cases
You can “pull” all data from Connections...but is it really needed?
Pulling data – when is it needed ?
43
“Push” approach (Event SPI) is sufficient to build most analytic solution
All necessary content (textual content, tags, …) is surfaced in every single event
All operation changing relationships (ie: adding/removing member, colleague, follower) are surfaced as events
“Pull” (REST APIs) approaches should stay limited to:
1.“Bootstrap” the Analytics Service based on a Connections system with data existing prior to the introduction of the event handler used in your analytic solution
Essentially building membership/network data (as needed)Seeding the content should not be needed, as it is repeated whenever an event about the content is generated
1.Fetching data not available through the Event SPI
Relatively “rare” for events generated from Connections
Pulling data – when is it needed ?
44
2 main approaches for pulling data from Connections
1. REST APIs (Atom / OpenSocial format)
REST-style HTTP based APIs (XML, Json format)
Transparency: programmatically access much of the same information that can be accessed through the IBM Connections UI
“Drink your own champagne” - public APIs used internally by plug-ins, mobile … and even some components Web UI (Activity Stream, Activities, …)
2. Seedlist
Designed to allow crawling of Connections data for indexing purpose by a search engine
Surfacing all content in the system – therefore it can be of some value for an analytic solution
HTTP based APIs (Atom XML format)
Pulling data from Connections
45
Example: /forums/seedlist/myserver returns ALL forum entries in the system
Textual content, author, number of comments, number of recommendations, parent id, ACL
Seedlist
46
REST APIs support basic authentication, form-based
authentication and (for most APIs) Oauth
Private data: strict enforcement of access on API
calls
Not very convenient for access by an analytic system...
“Super user”
Concept of “super user” - access control checks on private data are by-passed
On-premise: the “super user” is a user mapped in the JEE “admin” role across all Connections services
On Cloud: impersonation support can help to fetch data for a range of users (progressively being disclosed)
Authentication aspects for the REST APIs
47
In some very specific cases, data not available in a form easily consumable to build an analytic
solution
Example: getting the list of followers for a given object in the system
Query directly the Connections databases (in these specific cases only)
Database schema can change overtime and is private
REST APIs (Atom / OS APIs) Seedlist
Pros •Fine granularity: access content / meta-data for a specific object / container•Access relationship information
APIs are available for fetching membership lists, network information, who liked a given object, ...
•Batch retrieval of textual content•Incremental updates (but the Event SPI is much more suitable for this purpose)
Cons Lack of batch retrieval
capabilities
Focused around content - does
not expose all the data (missing
tags membership information)
Pulling data from Connections – What to use, when?
48
Leverage the Event SPI as much as possible
Provides (most of) the data needed for any elaborated analytics solution
Just let Connections push data to you! Easier, performwell
“Fill the gaps” by pulling data from the Atom/Seedlist
APIs
Initial loading of relationship / content data
Data not available through the Event SPI
One final warning:
Analytic solution access to private data through the Event SPI, and Atom/Seedlist APIs (with admin role)
=> Ensure your solution is not leaking private data to unauthorized users
Key Points
49
Analytics and Connections data
50
51
Credit: Paco Nathan
Data Source
ETL Data Prep Analytic
Data Consumption
Key parts of typical analytic pipeline
52
Key parts of typical analytic pipeline
53
Data Source
ETL Data Prep Analytic
Data Consumption
IBM Connections!
Key parts of typical analytic pipeline
54
Data Source
ETL Data Prep Analytic
Data Consumption
IBM Connections!
* Extract: Consume events
* Transform: Transform format
* Load: Load transformed data to database / disk
Key parts of typical analytic pipeline
55
Data Source
ETL Data Prep Analytic
Data Consumption
IBM Connections!
* Extract: Consume events
* Transform: Transform format
* Load: Load transformed data to database / disk
* Clean data (fetch specific data fields from events, assign unique id to objects)* Represent social relationships as a graph
A property graph has:
vertices and edges can have any number of properties
directed relationships
Graph structure is ideal to represent relationships between entities (people, objects)
Context around the event
Cause and effect of an event
Artefacts related to an event
Person A Person BStatus Update Status UpdateComment
creates createscomments on
Representing Connections data as graph
56
Key parts of typical analytic pipeline
57
Data Source
ETL Data Prep Analytic
Data Consumption
IBM Connections!
* Extract: Consume events
* Transform: Transform format
* Load: Load transformed data to database / disk
* Clean data (fetch specific data fields from events, assign unique id to objects)* Represent social relationship as graph
Query graph to generate insights: activity, eminence, reaction, network.Store score per user and org
Key parts of typical analytic pipeline
58
Data Source
ETL Data Prep Analytic
Data Consumption
IBM Connections!
* Extract: Consume events
* Transform: Transform format
* Load: Load transformed data to database / disk
* Clean data (fetch specific data fields from events, assign unique id to objects)* Represent social relationships as a graph
Query graph to generate insights: activity, eminence, reaction, network.Store score per user and org
API / UI to surface scores generated in previous step
Volume
Velocity Variety
Veracity
100s of events
per seconds
~500 kbytes
per
event
+ bulk data
=> 180 GB per
hour,
4.3 TB per day
Not an issue with
Connections, can
trust veracity
of events
from Connections
Semi-structured data
Time and spatial
aspects
Easy to represent as
graph
4 dimensions of Big Data
59
60
IBM Open Platform (ibm.biz/ibmopenplatform)
61
Data
Exchange
Data Scientist & Developer Platform Services
Analytic Services
Data Processing & Management
Value of collaboration data:
From discrete events to generating deep insights about people, network … the whole organization
Key insights by leveraging Big Data Analytics on events
Insights only limited by data and your own ability to process it
IBM Connections has its own powerful set of APIs to access to most interactions in the system
Fully available on promise
Being unlocked on Cloud
Analytic platform available (IBM Open Platform)
Get started with IBM Open Platform and build on top of it
Key points
63
IBM Open Platform @ ibm.biz/ibmopenplatform
IBM Engagement Analytics @ ibm.com/engage
Event SPI @ ibm.biz/eventspi w/ Java Doc @ ibm.biz/eventspijavadoc
SocialBiz User Group @ www.socialbizug.org
Follow us on Twitter @IBMConnect, @IBMSocialBiz, @marie_wallace
LinkedIn @ ibm.biz/socbizlinkedin; participate in the our Social Business group
Facebook @ www.facebook.com/IBMSocialBiz; give us a Like
Social Business Insights Blog @ ibm.com/blogs/socialbusiness; join the
conversation!
More resources online
64
Thank you
65
Based upon your session attendance, a customized list of surveys will be built for you.
Please complete your surveys via the conference kiosks or any web enabled device at https://www.connectsurveys.com or through IBM Event Connect.
Your Feedback Is Important!
66