the information supernova

32
The information SUPERNOVA THE THREAT AND THE CURE INTRODUCTION TO INFO ARAB ‘S TECHNOLOGIES BY; ALAA S. AL-AGAMAWI. SEPTEMBER 2014

Upload: alaa-al-agamawi

Post on 30-Jul-2015

155 views

Category:

Technology


1 download

TRANSCRIPT

The information SUPERNOVATHE THREAT AND THE CURE

INTRODUCTION TO INFO ARAB ‘S TECHNOLOGIES

BY; ALAA S. AL-AGAMAWI.

SEPTEMBER 2014

Technologies impact on data sizes…

Technology has always played the most major role in the growth of human knowledge.

With the ICT user generated phenomena, rate of growth is extraordinary.

sizes in algorithmic scale

1000BC 1000AD2000BC 0 2000AD5000BC 4000BC 3000BC

Invention of writing{IRAQ 5000BC}

Invention of Paper {China 105AD}

Invention of Printing{Europe 1400AD} Invention of

ICT{2000AD}

Before the SUPERNOVA explodes !!!

WHEN A STAR GOES OUT OF CONTROL IT EXPLODES AND EATS EVERYTHING AROUND!

IMAGINE …

THE INFORMATION SUPERNOVA.

WHEN DATA SIZE IS OUT OF CONTROL.

TO CONTROL IT … WE NEED TO UNDERSTAND IT.

The ICT era…

The social networks…

What happens in one minute in the social networks requires over…

1500 person days to read.

Strongest FRIEND and Toughest ENEMY!!!

Whether you are a movie star, a politician, a governmental agency, a broker or a brand owner, you should give adequate attention to what people are saying about you. In most cases, traditional monitoring is not enough and requires tens and hundreds of person days. And still cannot cover everything thoroughly.

Your ONLY choice…

Go

32 1

Is to utilize ICT to know what the people are saying about you and ACT accordingly.

Collect

Organize

AnalyzeAnticipate

Act

The Winning TEAM…

ActAnticipateAnalyzeOrganizeCollect

BI Implementation

Partner

THISYou should

do yourself

A typical solution architecture…D

ata

Sou

rces

DataConnecto

rs

worl

d w

ide w

eb

En

terp

rise

Serv

ers

Business Presentati

onLayer

Data GatheringLayer

Results Layer

Significs©

Proudly introducing

Fetch Technologies©

Historical background…

Fetch PC© 2006.Fetch enterprise©

2009.Fetch cloud© 2012.

The dilemma…

Typical crawling will bring many irrelevant data. That is…

Irrelevant information.

And waste of computer resources.

Relevant Area Irrelevant

AreaIrrelevant Area

Traditional crawling techniques…

In many cases search results goes to an irrelevant area.

v

Structured & unstructured data…

Dealing with all types of Sources…

Having an API… Does not have an API…

Only relevant data in structured format

Title

Brief

Media

Article body

AuthorPublisher

Date

Classifies with Social media interactions…

Never miss what you are looking for…

You will never miss any mention that you are looking for with our semantic based search engines. Our engines are taking care of the following... Basic search objects in a hierarchal format (e.g. a company, its

senior managerial staff, its main brands, sub brands ...etc). Totally morphed (i.e. in their singular, plural, masculine, feminine,

adjectives ...etc.). With all their Arabic/English synonyms and antonyms. Taking care of spelling mistakes and grammatical errors. Taking care of object types and their components.

And everything is easily available…

RSS

XMLJSON

SaaS

Timely

In-site

As fast as you may dream of…

The above figure (from Microsoft© Azure © control panel) represents a typical day of usage of Fetch News© semantic crawling processing.

As shown in the figure…

Exactly 129.64 Mega Bytes of sources’ contents were captured, analyzed, and converted to a structured text of 41.02 Mega Bytes in 281.96 seconds –i.e. about 500 Kilo Byte per second including connectivity time..

Maximum memory usage was 129.75 Mega Bytes fulfilling the needs of 64 parallel threads running on a single core processor.

Proudly introducing

Querying the text streams

Significs© Main focus…

CONVERTINGUNSTRUCTURED DATATO STRUCTURED DATA

ANDEXTRACTING MEANINGFUL INDICATORS

FOR THEIR CONTENTS WITH DISTINCT PERFORMANCE

w.r.t various areas of significance.

Analysis Dimensions…

Object

Type

components

Features

Sentiment

Feeling

Need

Attitude

Action

Descriptor

Modifier

Indecencies

Proverbs

Danger

Safety

Liking

Disliking

Main Dimensions Foundations…

Danger

Safety

Liking

Disliking

TimePlace

Subject

Object Type

ComponentFeature

DescriptorsModifiers

IndecenciesProverbs

NeedsFeelingsActionsAttitude

Basic Dimensions

Sentiment Dimensions

Object Dimensions

Descriptor Dimensions

العربية

English

Francais

Deep Analysis vs. shallow…

Shallow Analysis does not depend on the specific linguistic rules of the target language; thus can’t guarantee the precise understanding of the textual contents.

Deep Analysis uses specific linguistic rules of the target language; accordingly maximizes the accuracy of the semantic analysis results.

Info Arab® utilizes its wealth of linguistic technologies (e.g. speller, morphing, grammar, thesaurus …etc.) in order to avail a robust deep analysis engine.

3 Levels of Taxonomies…

Generic Taxonom

yBasic Objects’ types

Basic Features

Basic Components

Basic Measures

Basic Actions

Basic Feeling

Basic Needs

Basic attitudes

…etc.

Domain Specific

Domain specific• Object• Features• Measures• Actions• Feelings• Attitudes• …etcDomain Specific…•Subjects•Issues•…etc

User Specific

TaxonomyUser specific• Object• Features• Measures• Actions• Feelings• Attitudes• …etcUser Specific…•Subjects•Issues•..etc

Supported languages…

Significs© is currently able to analyze the following textual language…• Standard formal Arabic.• Egyptian common slang.• Transliterated Arabic texts (written in Latin characters).• Standard formal English.• GCC slangs.Significs© implements the Deep analysis approach to handle languages marked in BLUE while Shallow Analysis to handle languages marked in GREEN. Future versions of Significs© are planned to extend the deep analysis algorithms to the other languages marked in GREEN.

up to 20x1012 Areas of Significance

Areas of significance represent the exact requirements of a certain user/customer. They could be represented in one or more (Correlated/Intersected) dimension.

For example… A celebrity may be interested to have a look on any mention about his/her objects

(Persons, Places …etc.). A company may want to automatically route all incoming mail that Needs Support

of a certain Product to one of its affiliate companies and measure the customers Satisfaction.

A marketing agency may be interested to check the Feelings and Attitudes about a certain Component or Characteristic of one or all of its customers’ products.

A security agency may want to extract one or all Actions within a certain Subject that exceeds a certain threshold.

Available on almost all platforms…

Significs © is available on the following formats…• Microsoft© Windows© .DLL.• MAC OSX, UNIX, LUNIX .LIB & .SO.• Web service.• WCF web service.• http request (with some limitations regarding input text size).

Unprecedented Performance…

• 500’000 to 4’000’000 word per second depending on the analysis depth.Speed

• Total data dictionaries sizes ranges from 20-30 Mega Bytes.Storage

• Core memory is about 80 Mega Bytes.• Typical memory per thread is about 2 Mega

Bytes.Memory

Economic feasibility at a glance…

100,

000

200,

000

400,

000

800,

000

1,00

0,00

0

2,00

0,00

0

4,00

0,00

0

8,00

0,00

0

16,0

00,0

00

32,0

00,0

00

64,0

00,0

00

128,

000,

000

256,

000,

000

512,

000,

000

1,02

4,00

0,00

0 $-

$0.05

$0.10

$0.15

$0.20

$0.25

$0.30

$0.35

Cost per feed analysis

Manual cost Standard Professional Ultimate Open

Cost

per

feed

Number of feeds

$0.001 / feed @ 200 Million feeds.

Edition

Breakeven

Number of feeds

Standard 150K

Professional 300K

Ultimate 700K

Open license 3.5M

For both Fetch-Cloud© and Significs© site licensing

Thank you

Please feel free to contact us for any enquiry.

We shall be delighted to arrange for a life demo at any time of your choice.

Info Arab®, Arabic Information Systems, Alaa S. Al-Agamawi & Co.

7 Mosadek Street, Dokki,, Cairo 12312, Egypt.

Phone: +2 02 37 61 38 33 - Fax: +2 02 37 61 38 39 x115email: [email protected]

www.info-arab.com www.info-arab-technologies.com

Catering for the Arabic language needs in the Digital era