big data analytics - knime · 2017-05-23 · text mining data mining automation » big data...
Post on 21-Jul-2020
1 Views
Preview:
TRANSCRIPT
1 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Big Data Analytics
Analysis of high-volume and unstructured Data
Stefan Weingaertner, DYMATRIX CONSULTING GROUP
KNIME Meetup Italia, 10th October 2013
2 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Agenda
1 Company Introduction
2 Big Data - an Introduction
3 Big Data Analytics on high-volume Data
5 Livedemo: Advanced Email Classification
4 Big Data Analytics on unstructured Data
6 Q & A
3 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Company Introduction
4 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
DYMATRIX – The analytical CRM Company
» Solution provider for Customer Intelligence, Marketing Automation and
Advanced Predictive Analytics
» Consulting, development and implementation know how, based upon
more than 900 projects with mid- and large cap companies across
industries
» Goal- and client- oriented project execution based upon award winning,
established solutions
» Owner managed and independent
5 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Our Consulting Competence Centers
Business Intelligence
Advanced Analytics
Campaign Management
» Conception of (big) data warehouse and business intelligence architectures
» Enterprise Reporting Systems
» Dashboards
» Sales Controlling
» Planning & Forecasting
» Balanced Scorecard
E-commerce insight
» Customer Segmentation
» Customer Value Analysis
» Propensity Modeling (Cross-/Upsell/Churn)
» Shopping Basket Analysis
» Credit Rating Analysis & Credit Scoring
» Text Mining
» Data Mining Automation
» Big Data Analytics
» Design and Optimization of Campaign Processes and Workflows
» Implementation of Campaign Management Systems
» Integration of Data Mining Models in Campaign Processes
» Campaign Optimization
» Consulting & Implementation of Next Best Activity Processes
» Web Tracking
» Web Controlling
» Web Mining
» Real Time Recommendation
» Social Media Tracking & Analysis
» Web Performance Measurement
» Customer Journey Analytics
Analysis of client oriented processes Initial situation – Analysis – Conception of processes for customer retention and its optimization -
customer reactivation and new customer activation – benchmarking against industry leaders
6 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Solution Portfolio – The Customer Insight Suite
DynaCampaign
» Intelligent multi-touchpoint campaign management platform
» Planning, target group selection, execution and response measurement of campaigns
» Event-triggered realtime campaigning
DynaMine
» End2end automation of data mining processes
» Intelligent model management for automation of preprocessing, training & scoring of models
DynaCision
» Realtime decision management platform
» Design & exection of complex embedded decision processess
DynaSocial
» Social CRM platform to listen, track, identify and quantify customer needs and sentiments
7 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Our KNIME Solution Nodes & KNIME Consulting Services
PMML2SQL / PMML2SAS Converter
» Convert PMML to executable SQL Code for In-Database-Scoring
» Convert PMML to executable SAS Code for Model Scoring within SAS
Big Data Integration
» Access any Hadoop large-scale distributed batch processing infrastructure from KNIME
» Efficiently distribute large amounts of data & preprocessing across a set of machines
Uplift Modeling
» Predictive Modeling Nodes to predict the incremental response to marketing actions
» For up-sell, cross-sell, churn and retention activities
Interactive Scorecard Builder
» interactive Scorecard Building Nodes for Design of Credit or Marketing Scorecards
+ Business Consulting + Analytical Consulting + Technical Consulting + Trainings
8 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Referenzen References
Telecommunication Travel, Transportation Retail, Service Provider
9 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
References
Media Banks, Insurances Utilities, Industries, Public
Schwäbisch Hall
10 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Big Data - an Introduction
11 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
A Characterization of Big Data
Big Data
Volume
Structured
Structured & Unstructured
Streaming
Batch
Zettabyte Terabyte
Source: Understanding Big Data (Zikopolous et al.), 2012
12 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Needs
Possibilities
Decisions
Approach
Purchase Delivery
Usage
Service & Support
Remember
Challenge: Big Data Collection & Integration
Source: Phil Winters, 2011
13 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Needs
Possibilities
Decisions
Approach
Purchase Delivery
Usage
Service & Support
Remember
Big Data Analytics: Learn, Target & Influence!
Source: Phil Winters, 2011
14 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Big Data Analytics on high-volume Data
Volume
Structured
Structured & Unstructured
Streaming
Batch
Zettabyte Terabyte
Big Data
15 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Big Data Access
Hadoop Distributed File System (HDFS)
MapReduce
Hive HBase Had
oo
p
Exte
nsi
on
s
Mahout
An
alyt
ic
Ap
plic
atio
ns
Had
oo
p
Co
re
Big
Dat
a
Sou
rce
s
MapReduce Routines
16 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Big Data Analytics
Hadoop Distributed File System (HDFS)
MapReduce
Hive HBase Had
oo
p
Exte
nsi
on
s
Mahout
An
alyt
ic
Ap
plic
atio
ns
Had
oo
p
Co
re
Big
Dat
a
Sou
rce
s
MapReduce Routines
PMML2SQL Converter
17 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Big Data Analytics on unstructured Data
Volume
Structured
Structured & Unstructured
Streaming
Batch
Zettabyte Terabyte
Big Data
18 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
80% of the world’s data is unstructured.
Unstructured data is growing at 15 times the rate of structured data.
Source: Google Trends April 6, 2012
Big Data is not just about structured data…
15 times
80%
19 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
» …to classify all customer related text messages by
Source / Origin
Sentiment
Product or Service
Business Transaction
Context
etc.
» …to identify unknown trends
» …to identify cause and effect relations
» …to react on that information, e.g.
Technical Problems
Needs
Usability
Competition
etc.
Imagine…
The KNIME platform supports these efforts with comprehensive Text Analytics & Network Analytics capabilities!
20 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Deutsche Telekom: Social Earthquake
0
200
400
600
800
1000
1. Mrz. 8. Mrz. 15. Mrz. 22. Mrz. 29. Mrz. 5. Apr. 12. Apr. 19. Apr. 26. Apr.
Facebook Posts & Comments March & April 2013
Negativ
Neutral
Positiv
First Rumours: Limitation of Bandwidth (21.3. – 23.3.)
„DSL-Drossel“: Official Pressrelease on Limitation of Bandwidth leads to a Social Earthquake. (22.4. – 27.4.)
21 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
DYMATRIX Text Mining Process
22 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
DYMATRIX Text Mining Process (KNIME Text Processing)
Text Datasources
Datasources: • Facebook • Twitter • Emails • Data Provider
like GNIP, Datasift etc.
• Crawled Data • etc. For Machine Learning • Provide Training
Data for Classification (e.g. Sentiment)
Text Enrichment
Language Detection • English • German • Many more… Language individual NLP POS Tagging • Penn Treebank
Tagger • STTS Tagger Text Cleansing • Stop Words • Punctuations • Stemming Sentiment Amplifier • Matching of
Sentiment- & Emoticon-Dictionaries
Subject Matching
Text Tagging with any Subjects • Products • Brands • Business
Transactions • Service • Complaints • Requests • etc.
Fuzzy Matching with Dictionary Tagger • Matching of
Subject-Dictionaries
Sentiment Classification
Text Vectorization • Creation of text
predictors to predict sentiments
Machine Learning • Classification with
Predictive Analytics (e.g. Decision Tree)
Retraining Interface • Adjustment of
misclassified messages for permanent optimization of classification
Information Delivery
Text Data Mart • Make information
available in central Text Data Mart for visualization, alerting etc.
Fields of Application • Email-Routing • Event triggered
Campaign Management
• etc.
23 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
DYMATRIX Text Mining Process: Datasources
Text Datasources Information
Delivery Sentiment
Classification Subject
Matching Text
Enrichment
Access any Text Datasource to start the Text Mining Process
» Emails
» Crawler
» Data Provider like GNIP, Datasift etc.
Exemplified contribution on Facebook Fanpage
Vodafone UK
24 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
DYMATRIX Text Mining Process: Text Enrichment
Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap [----] signal but yet paying FULL monthly contract! Vodafone sort it.
Sentiment Amplifier
sort[VBG] signal[VBP] issues [VBZ] instead[RB] bringing[VBG] phones[NNS] Wk[NNP] 3[CD] crap[NN] paying[VBG] monthly[RB] contract[NN] Vodafone[NNP]
Removal of Stop Words & Punctuations
Penn Treebank POS Tagger (English Messages)
Why[WRB] not[RB] sort[VBG] your[PRP] signal[VBP] issues [VBZ] out[IN] instead[RB] of[IN] bringing[VBG] new[JJ] phones[NNS]!!!![SYM] Wk[NNP] 3[CD] of[IN] crap[NN] but[CC] yet[RB] paying[VBG] FULL[NNP] monthly[RB] contract[NN] ![SYM] Vodafone[NNP] sort[VBG] it[PRP] .[SYM]
Text Datasources Information
Delivery Sentiment
Classification Subject
Matching Text
Enrichment
Original Facebook Message
Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it.
25 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
DYMATRIX Text Mining Process: Subject Matching
Subject Matching (Fuzzy Matching)
Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal [NETWORK] but yet paying FULL monthly contract! Vodafone sort it [COMPLAINT].
Text Datasources Information
Delivery Sentiment
Classification Subject
Matching Text
Enrichment
Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it.
BUSINESS TRANSACTION: Complaint
NETWORK: No Signal
PRODUCT: Nokia Lumia 925
Original Facebook Message
26 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
DYMATRIX Text Mining Process: Sentiment Classification
Output from Text Enrichment
Predictors relevant for Text Classification , e.g. - Emoticons positive/negative - Length of message - Fragments positive/negative - Likes - Words positive/negative - Comments - Author-related Inputs - Other linguistic Inputs
Text Vectorization (Transformation)
Text Datasources Information
Delivery Sentiment
Classification Subject
Matching Text
Enrichment
Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it.
Original Facebook Message
Text Classification with Decision Tree
Resulting Classification
27 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
DYMATRIX Text Mining Process: Information Delivery
Make information available in central Text Data Mart Visualization in DynaSocial
Original Facebook Message
Other Fields of Application
» Subject-oriented Email-Classification & Email-Routing
Text Datasources Information
Delivery Sentiment
Classification Subject
Matching Text
Enrichment
Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it.
Sentiment Business Transaction
Product Relevance
+
+ + +
Network
28 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
DYMATRIX Text Mining Process: KNIME Workflow
29 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Benefits
30 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
» Text Enrichment & Classification Workflows can be used for classification of any electronic text message (e.g. Social Content, Blogs, Emails).
» KNIME Server-based Text Enrichment & Classification Workflows can be deployed as a webservice and called easily from any other application.
KNIME Server: Develop once, deploy everywhere!
Benefits
» Uniformed Sentiment- and Classification-Handling for all customer-related messages.
» Batch- or Realtime-Execution from any application.
31 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Application Integration I: DynaSocial
Social Media Monitoring & Analytics
32 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Generic Big Data Model
Social Media Analytics Data Management
Social Media Analytics Dashboard
DynaSocial – Social Media Excellence Architecture
Text Enrichment & Classification Network Insights
Advanced Social Media Analytics Text Mining & Network Mining
Social Media Analytics Content Extractor
Client individual Sources
Social Media Data Provider
Social Service Platforms
Emails Integrated Social Inbox including all Social Touchpoints
Social Engagement
Data Sources Sentiments & Classifications Reports & Dashboard
DynaSocial Configuration Center
33 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
DynaSocial Management Dashboard
Activities
Sentiment Ratio
Key Influencer
Platform Distribution
Trends compared to competition (Share of Voice)
Geographic Distribution
Overall Sentiments
Top Keywords
Flexible Selection of Time Windows
…
34 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
DynaSocial Management Dashboard (Project Example)
35 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Application Integration II: Advanced Email-Classification
Multidimensional realtime Email-Classification
36 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Email Classification: MS Exchange Connector
KNIME Server
Microsoft Exchange Webservice
.NET Batch
Microsoft Outlook
2 Call .NET Procedure and transfer email contents to KNIME Server via Webservice Call.
Incoming Email
Call KNIME Text Enrichment & Classification Workflows und return classification results.
Classification results are returned to Exchange Server and are saved persistantly with object categories.
Any clients having access to Exchange Server get the same classification.
1
4
3
5
Microsoft Outlook Webaccess
Other Email-Clients
37 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Livedemo
Realtime Email-Classification
38 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Q & A
39 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013
Thank you for your attention. We are happy to answer any of your questions!
DYMATRIX CONSULTING GROUP GmbH Zeppelin Carré Lautenschlagerstrasse 2 D-70173 Stuttgart Your Contact: Stefan Weingaertner
Phone
Fax E-Mail
Web
+49.711.22.007.88 - 12 +49.711.22.007.88 - 88 s.weingaertner@dymatrix.de www.dymatrix.de
Contact
top related