2015 11-26-v1-big-data-exec-training

56
Dr. Dickson Lukose Artificial Intelligence Lab. 26 th November 2015 Reaping Benefits from Social Network Data BIG DATA EXECUTIVE TRAINING, PENANG

Upload: dickson-lukose

Post on 21-Jan-2018

397 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Dr. Dickson Lukose

Artificial Intelligence Lab. 26th November 2015

Reaping Benefits from Social

Network Data

BIG DATA EXECUTIVE TRAINING, PENANG

Contents

• Emergence of Social Web

• World of Data

• Knowledge Audit, Ontology

• Knowledge Portals Challenges

• Semantic Technology Platform

• Artificial Intelligence Stack – Text Understanding

– Natural Language Query

– Data Harmonization

• Social Media Intelligence

• Social Network Intelligence

• Conclusions

2

© 2015 MIMOS Berhad. All Rights Reserved.

Emergence of Social Web

© 2015 MIMOS Berhad. All Rights Reserved.

3

Organization Units

Partners

Ad Hoc Teams

Communities of Interest

Communities

Communities

of Practice

Social Networks (e.g. LinkedIn and

Facebook)

Engineered Emergent

Purpose Drives

Interest Drives

Gartner, 2009

World of Data

© 2015 MIMOS Berhad. All Rights Reserved.

4

Structured

Unstructured

Semi-Structured

Knowledge Audit

2015 MIMOS Berhad. All Rights Reserved.

5

Enterprise Data Linked Open Data

Sensor Web

ACTIONABLE KNOWLEDGE

ASSETS

Structured, Semi-Structured & Unstructured

Unstructured Structured & Semi-Structured

Ontology

Technical Writing

Term Databanks

Machine Translation

Human Translation

InformationRetrieval

Knowledge Engineering

Consumer Information

R&D

Standardisation

Nomenclature

Terminology

AGROVOC

MYGMO

PADIPEDIA

AGRIS

HERBAL MEDICINE

2015 MIMOS Berhad. All Rights Reserved.

6

SNOMED-CT

STW

Knowledge Portal Challenges

© 2015 MIMOS Berhad. All Rights Reserved.

7

Knowledge Portal

Natural Interface

High Density Visualization

Integration and Insight into Social

Media

CoP Network Analytics

Collaborative Problem

Solving

NOT What is the Content, BUT, What is

in the Content

Making Sense of Social Big

Data

Big Data Challenges:

Velocity Diversity Volume

Seamless Integration of

SME in Problem Solving

Semantic Technology Platform

8

© 2015 MIMOS Berhad. All Rights Reserved.

Artificial Intelligence Stack

(c) 2014 MIMOS Berhad. All Rights Reserved.

9

KNOWLEDGE REPRESENTATION &

REASONING

MACHINE LEARNING

STATISTICS GENETIC

ALGORITHM NEURAL

NETWORK

SUBJECTIVE ANALYTICS

NATURAL LANGUAGE

PROCESSING

VIDEO ANALYTICS

Mi-SEMANTICS

Mi-SP

Mi-CLIP

NETWORK ANALYTICS

ACCELERATION TECHNOLOGY

Mi-INTELLIGENCE

Mi-VISUALITIC

IMAGE ANALYTICS

Mi-TARGET Mi-

HARMONY Mi-

AVComm Mi-BIS

ALGORITHM

Mi-DSS

Mi-AccLib

Mi-AVSafe

Finance

TEXT UNDERSTANDING

What is Text Understanding?

Conceptualization

John is going to the bank by bus

Person

Human

Animate

Financial institution Road vehicle

Agent Destination using Instrument

Inanimate

Knowledge Graph

go

© 2015 MIMOS Berhad. All Rights Reserved.

English Text:

11

male-person:

“John” go: * bank: *

bus: *

agnt dest

inst

What is Text Understanding?

Penduduk mendapat bantuan daripada kerajaan

Human

Animate

resource organization

Inanimate

Knowledge Graph

mendapat

Malay Text:

12

Agent Resource originating Source

© 2015 MIMOS Berhad. All Rights Reserved.

Conceptualization

大卫 在 图书馆 等了 3小时。

What is Text Understanding?

Person

Human

Animate

Library

wait

13

3 hours

Mandarin Text:

© 2015 MIMOS Berhad. All Rights Reserved.

Knowledge Graph

Agent Location for Time Location

Conceptualization

English NLP

Sentence

Knowledge

Graph

Syntax

Structure

14

© 2015 MIMOS Berhad. All Rights Reserved.

Malay NLP

Sentence:

Meningkatkan harga barang dan minyak kerana inflasi negara.

Annotated sentence:

Meningkatkan_VB harga_NN barang_NN dan_CC minyak_NN

kerana_CC inflasi_NN negara_NN

Knowledge Graph:

15

© 2015 MIMOS Berhad. All Rights Reserved.

Mandarin NLP

16

Text

Segmented

Text

Part-of-

Speech

Results

© 2015 MIMOS Berhad. All Rights Reserved.

Mandarin NLP

17

Text

Segmented

Text

Entity

Recognition

Results

© 2015 MIMOS Berhad. All Rights Reserved.

Mi-NLP Demo

18

© 2015 MIMOS Berhad. All Rights Reserved.

NATURAL LANGUAGE

QUERY

Traditional Search

Too many results

Pattern matching Pattern matching

20

© 2015 MIMOS Berhad. All Rights Reserved.

Traditional Search

21

© 2015 MIMOS Berhad. All Rights Reserved.

Knowledge Base Preparation

Mi-CLIP

Mi-HARVESTER

Mi-KRAKEN

Mi-NLP

KNOWLEDGE BASE

World Wide Web

Linked Open Data

Enterprise Database 22

© 2015 MIMOS Berhad. All Rights Reserved.

Question Answering System

Mi-CLIPTM

Text Understanding

SQ VSQ

Knowledge

Graph

Texts

Question

Mi-Reasoner

Answer

English

NLP Malay

NLP

23

Mandarin

NLP

© 2015 MIMOS Berhad. All Rights Reserved.

Natural Language Query Interface

24

© 2015 MIMOS Berhad. All Rights Reserved.

Semantic Query

25

© 2015 MIMOS Berhad. All Rights Reserved.

Visual Semantic Query

© 2014 MIMOS Berhad. All Rights Reserved.

26

DATA HARMONIZATION

SNOMED CT terminology

311,000+ concepts

Hypertensive complication

[SCT_449759005]

Kidney disease [SCT_90708001]

Neoplastic disease [SCT_55342001]

Hypertensive renal disease

[SCT_38481006]

Neoplasm of kidney

[SCT_126880001]

Renal impairment [SCT_236423003]

Chronic renal impairment

[SCT_236425005]

Kidney disease

Concept ID: 90708001

Preferred term: Kidney disease

Synonym(s): - Renal Disease - Nephrosis - Disorder of kidney - Nephropathy - Disease of kidney - Renal disorder

28

4,060,716+ triples 1,360,000+ relationships

Disorder of abdomen Kidney finding Disorder of body cavity Disorder of kidney and/or ureter Finding_site.Kidney structure

Logical formula:

What can SNOMED CT

be used for?

© 2015 MIMOS Berhad. All Rights Reserved.

Data Harmonization Process

HIS1

HIS2

HIS3

DB2

DB3

DB1

Schema Tables

Tables Schema

Schema Tables

IT Staff (Consolidate the data)

Subject Matter Expert (Interpret the data)

- Time Consuming - Limited Resources - Skills-dependent - Prone to errors

29

Automate processes Enable semantics Improve reporting accuracy

SNOMED CT Big Data: Freetext

Challenge

© 2015 MIMOS Berhad. All Rights Reserved.

ID Patient Name Gender Symptoms Diagnosis

1 xx M Fatigue, Chest pain Heart failure

2 yy F Breathlessness Asthma

ID Nama Pesakit Jantina Tanda-tanda Diagnosis

1 aa L Letih, Sakit dada Lemah jantung

2 bb P Sesak nafas Penyakit lelah

ID Pat.Name Sex Disorder Diagnosis

1 kk Male Tiredness, Pain in chest Cardiac failure

2 ll Female Dyspnea BHR

HIS1

HIS2

HIS3 SNOMED CT (Refsets)

Challenge of freetext

[371484003] [371484003] [371484003]

[346741003] [346741003] [346741003]

[263495000] [263495000] [263495000]

[243814003] [243814003] [243814003]

[439401001] [439401001] [439401001]

[276179002] [276179002] [276179002]

[84229001] [29857009] [84229001] [29857009] [84229001] [29857009]

[84114007] [84114007] [84114007] [267036007] [267036007] [267036007]

30

Query: how many cases of Dyspnea?

1 3

[195967001] [195967001] [195967001]

© 2015 MIMOS Berhad. All Rights Reserved.

Mi-Harmony Demo

31

© 2015 MIMOS Berhad. All Rights Reserved.

SOCIAL MEDIA

INTELLIGENCE

Challenges and Solution

33

Internet

Expensive, Lengthy and subject to Error

Manual Searching and Monitoring Automatic

Eliminate the manual process, Save time and cost

Aid in Decision Making

Centralized Analytics

Dashboards

© 2015 MIMOS Berhad. All Rights Reserved.

Social Media Analytics

Internet

3. Visual reports on Insights presented to decision maker for further action

1. Harvest content from the internet (world wide web, social web, content web)

Automated Data Harvesting 2. Social Media Analytics to

generate insights about the topic of interest

Social Media Analysis

© 2015 MIMOS Berhad. All Rights Reserved.

Domain Ontology 34

Mi-Intelligence Process Overview

35

Expansion

Harvesting

Processing

Subjective Analytics

Visual Analytics

Domain Ontology

Insights

GATHER DATA

SEMANTIFY DATA

DISCOVER KNOWLEDGE

IDENTIFY PATTERNS Domain

Ontology

ESTABLISH SEARCHSPACE

© 2015 MIMOS Berhad. All Rights Reserved.

35

Automated Data Harvesting

© 2015 MIMOS Berhad. All Rights Reserved.

1. Harvest content from the internet (world wide web, social web, content web)

Add new topics to analyse

Summary of data harvested and processed

Search topics

36

Content Insights

© 2015 MIMOS Berhad. All Rights Reserved.

Where are the posts coming from?

Drill down to discover detailed information

about posts Who are the

users?

Where are the posts coming from?

37

Social Media Analysis

© 2015 MIMOS Berhad. All Rights Reserved.

Sentiments

Domain Ontology

Emotions

Anxiety machine understandable

domain knowledge

38

Social Media Analysis

© 2015 MIMOS Berhad. All Rights Reserved.

2. Social Media Analytics to generate insights about the topic of interest

Sub topics related to the main topic

Sentiments from social media

Sentiments from different regions

Sentiments from different regions

39

Social Media Analysis

© 2015 MIMOS Berhad. All Rights Reserved.

2. Social Media Analytics to generate insights about the topic of interest

Drill down to discover detailed information

about posts

Sentiments from different regions

Sentiments from different regions

Narrow down to the individuals posts of

interest

40

Mi-Intelligence Demo

© 2014 MIMOS Berhad. All Rights Reserved.

41

SOCIAL NETWORK

INTELLIGENCE

Social Network Analysis (Wikipedia)

© 2015 MIMOS Berhad. All Rights Reserved.

43

Social network analysis (SNA) is the process of investigating social structures through the use of network and graph theories.

It characterizes networked structures in terms of nodes (individual actors, people, or things within the network) and the ties or edges (relationships or interactions) that connect them.

It is a key technique in modern sociology. It has also gained a significant following in anthropology, biology, communication studies, economics, geography, history, information science, organizational studies, political science, social psychology, development studies, and sociolinguistics.

© 2014 MIMOS Berhad. All Rights Reserved.

44

Mi-VisualiticsTM

Unstructured Data

Structured Data

Mi-Visualitics © 2014 MIMOS Berhad. All Rights Reserved.

45

Filtering Visualizing

Centrality Analysis Ego-group Extraction

Mi-VisualiticTM

© 2014 MIMOS Berhad. All Rights Reserved.

46

Clustering

Clique Discovery

Path Searching

Degree Centrality

Mi-VisualiticTM

© 2014 MIMOS Berhad. All Rights Reserved.

47

© 2015 MIMOS Berhad. All Rights Reserved.

48

Mi-Visualitics Demo

2015 MIMOS Berhad. All Rights Reserved.

Harvest, Extract, Harmonize, and Transform Data

Information related to Suspects

Drugs Cases

JKDM

WCO

Rule Mining

(Machine Learning) SME

(Risk Officer)

JKDM HQ

Suspect KB

Rules

Social Network Identification

(Mi-Target)

750 Suspects 30 Million Facts

Suspect Data Preparations

49

Suspect Network Analysis

Harvest, Extract, Harmonize, and Transform Data

Information related to Suspects

Suspect Network Analysis Engine (Mi-Visualitics)

Potential High Risk Individuals • Most Influential • Top Connectors

Intelligence Officers

Suspect Watchlist

Drugs Cases

JKDM

WCO

Suspect KB

Intelligence Officers

Other Information

2015 MIMOS Berhad. All Rights Reserved.

50

2015 MIMOS Berhad. All Rights Reserved. 51

Rule Mining (Machine Learning)

SME (Risk Officer)

Passenger Risk Profiling

Adaptive Personal

Demographic Analysis

(Mi-Target)

Observation:

• Travelling Purpose • Duration of Stay • Profession • Physical

Appearance

Flight Information:

•Flight Code

•Airline Operator

•Departing Airport

•Arrival Airport

Personal Information:

•Name

•Passport Number

•Nationality

•Age

Suspect Watchlist

JKDM HQ

Airports

Suspect KB

Rules

Profiling Rules using Ego-Group

Intelligence Officers

Suspect Watchlist

Hector B.L

Hector Beltran Leyva

Adaptive Personal

Demographic Analysis

(MI-Target)

Suspect KB

Rules

Suspect Network Analysis Engine

(Mi-Visualitics)

2015 MIMOS Berhad. All Rights Reserved.

52

Mi-Visualitics Demo

© 2014 MIMOS Berhad. All Rights Reserved.

53

CONCLUSION

So, What are we doing in AI-Lab, MIMOS?

Are we doing Big Data Analytics?

ANS: YES (but focused unstructured data)

Specifically, Social Media Intelligence & Social Network Intelligence

Are we working on Cognitive Computing?

ANS: YES for the last 8 years, and will continue to be one of the main R&D focus in RM-11.

Are we working on Prescriptive Analytics?

ANS: Not yet, but it is the MAIN focus of R&D

in RM-11.

55

© 2015 MIMOS Berhad. All Rights Reserved.

ありがとうございます