"updates on semantic fingerprinting", francisco webber, inventor and co-founder of...

29
© cortical.io inc. 2016 Semantic Folding co-Founder and General Manager Francisco De Sousa Webber Language Intelligence made easy [email protected]

Upload: dataconomy-media

Post on 21-Apr-2017

1.223 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© cortical.io inc. 2016

Semantic Folding

co-Founder and General Manager

Francisco De Sousa Webber

Language Intelligence made easy

[email protected]

Page 2: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

6

What is Cortical.io ?

We explore & expand Semantic Folding Theory

We spread & sell Semantic Folding Technology

We build & grow Cortical.io as the “Oracle” for

Semantic Processing

Page 3: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

6

NLP Market Problem

• All systems are based on statistics - low differentiability • Hard to build - high level of expertise needed • Inaccurate compared to humans - low precision • Have complex tuning procedures - hard to deploy • Slow and inefficient compared to humans - hard to scale

Natural Language Processing Technology:

• Human metadata management - for differentiability • Human specialists - for expertise • Human correction - for precision • Human generated gold-standards - for tuning

Weakness Compensated with:

Business-NLP is currently very expensive.

Page 4: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

6

Solution: Language Intelligence

• By Jeff Hawkins (Silicon Valley, California) • numenta.com technical implementation & IP • Processing algorithm of the human brain (neo-cortex)

Hierarchical Temporal Memory Theory

• By Francisco De Sousa Webber (Vienna, Austria) • cortical.io technical implementation & IP • Processing language-data like the human brain

Semantic Folding Theory

+

Page 5: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

Cortical Constraints• Neocortex is a 2D sheet of repeating Modular Assemblies of neurons with

binary inputs. • Neocortex is a Memory System not a processor. • Neocortex stores Pattern Sequences. • Neocortex is an Online Learning system • Neocortex is only Trained by Exposure to real-world data • All data fed into the neocortex must have Sparse Distributed Representation

(SDR) format: • SDRs are very long Binary Vectors with max. 2% of “1”. • Every SDR-bit is a self-contained Semantic Feature of the world (via sensorial input). • Every SDR-bit is an Explicit Part of the signal. • Similar “things” have similar SDRs. • The Union of SDRs maintains all information of its members.

Page 6: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

Virtual Word Layer

hear see touch

word (SDR)…..

Wor

d se

nsor

stre

amW

ord production stream

Sym

bol in

put

Muscles

Motor output

Symbol output

Virtualization into Retina

Page 7: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

Offline Process

RetinaDB GenerationRetina Training defines the Semantic Universe.

Training Collection specifies all vocabulary, linguistic properties and knowledge.

The Semantic Folding Engine generates a Semantic Map.

Every utterance is positioned within the Semantic Space.

Every term is defined by its distributed selection of utterances/contexts.

A topographic bit-vector is generated for each term of the corpus.

Training Collection

PreprocessinSemantic Folding Engine

Fingerprint Generation

Page 8: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

Retina-API OperationThe generated topographic bit-vectors are called Semantic Fingerprints

The Semantic Fingerprints are stored in the highly performant RetinaDB

The RetinaDB is a complete Language Model

The Retina-API provided functions: convert, compare, dissect, classify and extract text

The user application interacts via a REST Interface

Functions out

Fingerprint out

Compare out

RetinaDB

Retina API

User Application

REST call

Online ProcessOffline Process

Training Collection

PreprocessinSemantic Folding Engine

Fingerprint Generation

Page 9: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

Tuning The Retina

“cholecystitis”

Page 10: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

Aligning Semantic Spaces

philosophy philosophie filosofía философия فلسفة

Concepts and their representations are stable across languages.

EN FR ES RU AR ZH

Page 11: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

The Cortical.io Retina Technology …

… converts any text into a semantic fingerprint.

teens like playing good music with their mobile

phonesFingerprint Generation

Page 12: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

organ

Step 1: Word Fingerprints

piano

church

liver

Page 13: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

aggregation +

sparsification

Step 2: Text Fingerprintsteens like to hear music on their mobile phones

teens like to hear music on their mobile

phones

Page 14: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

Similar meanings …

… look similar

37% overlap

teens like using itunes on their iphone he consumes chart hits on his notebook

Page 15: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

Different meanings …

… look different

the fishermen are sailing out of

5% overlap

teens like using itunes on their iphone the fishermen are sailing out of the harbor

Page 16: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

EvaluationThere are very few comparable algorithms: a couple of academic ones that cannot be readily used for production purposes and Google’s Word2Vec.

The MEN Test Collection: http://clic.cimec.unitn.it/~elia.bruni/MEN.html The RG-65 Test Collection: http://www.aclweb.org/aclwiki/index.php?title=RG-65_Test_Collection_(State_of_the_art) The WordSimilarity-353 Test Collection: http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/ Yu&Dredzde 2014: http://arxiv.org/pdf/1411.4166.pdf Distributed representations of words and phrases: http://papers.nips.cc/paper/5021-di

MEN-3K RG-65 WS-353

word2vec (Google) 55,2 44,8 54,7

Yu&Drezde (2014) 50,1 47,1 53,7

cortical.io Retina 67,4 71,3 62,2

% better word2vec 18,1 37,2 12,1

% better Yu&Drezde 25,7 33,9 13,7

Page 17: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

6

Semantic Folding Products

• Document Retrieval • Expert Finding • Knowledge Management

Enterprise Semantic Search

• Semantic Streaming Text Filter • (Social) Media Monitoring • Business Intelligence & Analytics

Big Text Data

• Natural Language based Automation • Content Personalisation • Semantic Profiling

Semantic Matching

similarity engineexample document

most similar documents

ordered along the users

information need

query document index result set ranking

#finance#markets

#mobile

#movies#products

#trend Topic of interest

Analytics

Match Making

En

terp

rise

Ap

plic

atio

nW

eb A

pp

Page 18: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

6

Cloudera Integration

Page 19: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

6

Semantic Search

similarity engineexample document

most similar documents

ordered along the users

information need

query document index result set ranking

Page 20: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

Semantic Content Filter

real-time, across languages, intelligent, meaning based

#finance#markets

#mobile

#movies#products

#trend

Topic of interest

Analytics

Page 21: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

Example: Twitter Filter The State of the Art

desired topic

Every tweet related to

smart phones

200 catch words

mobile phone

Iphone

cell

Android

sim-card

text message

network

Verizon

AppleGoogle

5 words per tweet

Required throughput for one filter 200 X 5 X 20,000 = 20,000,000 comparisons per second

20,000 tweets/sec

Page 22: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

The State of the Art

Cost per Filter: $ 10,000+ per Month

Page 23: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

Example: Twitter Filter Semantic Fingerprinting

stream of semantic fingerprintstwitter firehose

realtime content sub-stream

Filter Fingerprint

not matching

matchmatchmatch

Page 24: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

Cost per Filter: $ 10 per Month

Cortical.io Streaming Text Filter

convert 100.000+ tweets per second

1.000+ semantic filters

+

one per firehose scalable with number of Filters

Page 25: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

Dynamic Topic Pattern Analysis

Topic Monitoring

Unseen topics or sudden topic jumps are detected

Compliance Monitoring

Ongoing e-mail conversation Time >

Appearance of unseen topic clusters

Page 26: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

Similar meanings “look” similar

Special “Financial Retina”

Bridging the Vocabulary Gap

fraud

Words

corruption AND mafia

Expressions

“anti human trafficking”

Idioms

Money laundering is the process of transforming the proceeds of crime into ostensibly legitimate

money or other assets.

Text

Page 27: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

Combine Fingerprints with AI Algorithms

Text Anomaly Detection

7. Enabling Artificial Intelligence Applications

email

chat

Message Forums

Blog Posts

Facebook PostsRealtime Anomaly Detection in Text Streams

any Text Stream

Page 28: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

Combine Fingerprints with AI Algorithms

http://www.cortical.io/demos/semantic-anomaly-detection/

Page 29: "Updates on Semantic Fingerprinting", Francisco Webber, Inventor and Co-Founder of Cortical.io

© c

ortic

al.io

inc.

201

5

website: cortical.io

product: https://aws.amazon.com/marketplace/pp/B00T5794P6/

twitter: http://twitter.com/cortical_io

video: https://www.youtube.com/watch?v=g3ZxJokDpds

demos: http://www.cortical.io/demos.html

API: http://api.cortical.io

Numenta: http://numenta.com