uig360 brochure

4
Why is having a unified informaon governance approach to managing unstructured data so important? Simply put, long term growth and success can neither be sustained nor achieved without harnessing the business value— content intelligencecontained in the vast repositories of unstructured data. In addion, exisng approaches to man- aging unstructured informaon do not offer a holisc, unified soluon, but rather ad- dress discreet areas and are too simplisc for the complexies of todays informaon environment. This is at the heart of the challenge being faced by private and governmental organiza- ons. Today, Informaon governance is a topic of interest both inside and outside IT. CIOs, CFO, IT managers responsible for data and infrastructure management, security officers, GRC (Governance, Risk, and Compliance) officers and general counsel are struggling to find a comprehensive soluon to managing the ever growing volume of unstructured data. The realizaon that informaon governance has gone beyond just Unified Information Governance 360 — UIG360° SECURITY HOSTING CLOUD SERVICES PARAGON TOWERS 233 NEEDHAM STREET, NEWTON MA 02464 | p: 617-658-2030 WWW.HAYSTAC.COM ANALYTICS SERVICES Adopng a holisc approach means looking at the unstructured informaon in an organic way and acknowledging that the enre informaon supply chain is an inter-related eco-system that needs a unified and comprehensive management. Ad- dressing only a sub-set of the informaon flow will simply not work. Haystacs UIG360˚is a Unified Informaon Governance soluon that effecvely addresses the need of enterprises for hierarchical document classificaon, unstructured data analycs and document retenon management, all on a single, highly flexible plaorm delivered via a secured private cloud. As the IG landscape connues to change – and the definion of informaon governanceitself matures over me – organizaons that take the me to es- tablish a holisc, program-based approach to IG management will be in the best posion to benefit from: Effecve risk management Assigning a value to the risk being managed Reduced risk and improve security and privacy.

Upload: daniel-sapir

Post on 06-Aug-2015

22 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: UIG360 Brochure

Why is having a unified information governance approach to managing unstructured

data so important? Simply put, long term growth and success can neither be sustained

nor achieved without harnessing the business value— content intelligence— contained

in the vast repositories of unstructured data. In addition, existing approaches to man-

aging unstructured information do not offer a holistic, unified solution, but rather ad-

dress discreet areas and are too simplistic for the complexities of today’s information

environment.

This is at the heart of the challenge being faced by private and governmental organiza-

tions. Today, Information governance is a topic of interest both inside and outside IT.

CIOs, CFO, IT managers responsible for data and infrastructure management, security

officers, GRC (Governance, Risk, and Compliance) officers and general counsel are

struggling to find a comprehensive solution to managing the ever growing volume of

unstructured data. The realization that information governance has gone beyond just

Unified Information Governance 360 — UIG360°™

SECU RIT Y HOSTI NG CLOUD SE RV ICES

PARAGON TOWERS 233 NEEDHAM STREET, NEWTON MA 02464 | p: 617 -658 -2030

WWW.HAYSTAC.COM

ANAL YTIC S SER VICE S

Adopting a holistic approach means looking at the unstructured information in an organic way and acknowledging that the

entire information supply chain is an inter-related eco-system that needs a unified and comprehensive management. Ad-

dressing only a sub-set of the information flow will simply not work.

Haystac’s UIG360˚™ is a Unified Information Governance solution that effectively addresses the need of enterprises for

hierarchical document classification, unstructured data analytics and document retention management, all on a single,

highly flexible platform delivered via a secured private cloud.

As the IG landscape continues to change – and the definition of “information

governance” itself matures over time – organizations that take the time to es-

tablish a holistic, program-based approach to IG management will be in the

best position to benefit from:

Effective risk management

Assigning a value to the risk being managed

Reduced risk and improve security and privacy.

Page 2: UIG360 Brochure

PARAGON TOWERS 233 NEEDHAM STREET, NEWTON MA 02464 | p: 617 -658 -2030

WWW.HAYSTAC.COM

UIG360°™ Core Capabilities Effective Technology For Pro-active Classification

UIG360°™ Overa l l a rc h i te ct u re for scan ne d image c lass ifi catio n, Data C a p-tu re an d Log i ca l Docu me nt Bou nd ar y D eter min atio n (L DBD)

Document Exemplars

Identify document classes and types

Run reports

Define feature set and classification rules

Train (seed) documents

Run page classification

Run visual classification and visual classification

post-processing

Define/refine rules of page collapsing into documents

Initial input data set review to detect image defects and possible feature sets for classification models

Review results

Configure image pre-processing algorithm

for data set

Run image pre-processing

Image preprocessing and visual classification

in case of unacceptable noise back to start

Reporting

Review results

Classification

in case of unacceptable noise back to start

Review results

in case of unacceptable noise back to start

Genre-based, supervised

learning for image classifica-

tion:

‘Visual words’ - points of

interest and layout based

features

Multi-class probabilistic

Support Vector Machine

Support for multiple mod-

els

Soft Dictionary - Fuzzy search

of dictionary terms or phrases

to classify into a pre-defined

taxonomy

Image quality improve-

ment algorithms

Skewed angle ad page

orientation detection

Image segmentation—

separation of image into

graphics, photos, tables,

text and ’noisy data’

OCR post processing

Fuzzy pattern matching frame-

work with key phrase data ex-

traction. Built-in capabilities in-

cludes:

Ability to find merged and

split words

Built-in expressions and

dictionaries

Boolean and built-in func-

tions

Domain-specific language

with simple grammar:

“invoice Amt.”,

“>invoiceAamount”….

API for custom expressions

Classification Image Processing Data Capture and Extraction

Location based constraints

Uses hOCR format to retrieve text

location information

Improved OCR segmentation into

logical text blocks

Fuzzy in-text rules to find and

match location of anchor expres-

sions

Built-in location expressions:

right, left, above, below, block..

Example:

Above (“invoice summary”, block> supplier

contracts”

Right (“sub total”, below (“taxes”, money

>taxAamout

Page 3: UIG360 Brochure

Built in custom reporting

Export of search results

Extractive Summaries

Hierarchical clustering of search

results

Faceted Search by:

Categories

Captured data points

Document age

Extensions…..

Correlate facets and queries

Knowledge model development con-

sists of defining the relevant docu-

ment

Pre-defined libraries

Gap analysis—compare knowledge

model to retention policy

Disposition eligibility—for machine-

classified results can be run at both

summary and detail levels.

Anomaly Detection – detecting docu-

ments in the data stream that do not

belong to any of the classification

models

UIG360°™ is designed to support scalable (100s TB) multi-class unstructured data classification based on discriminative Ma-

chine Learning (ML) algorithms. It supports multitude of documents formats (over 3000) and special feature set to classify

emails threads. It also supports best practices in classifying emails based on attachments. Its designed to allow unstructured

data across the network to be crawled, indexed and processed in accordance with machine learning classification and rule-

based coding.

Analytics Document Retention Data Capture and Extraction

Fuzzy pattern matching framework with key

phrase data extraction. Built-in capabilities

includes:

Ability to find merged and split words

Built-in expressions and dictionaries

Boolean and built-in functions

Domain-specific language with simple

grammar

API for custom expressions

UIG360°™ O vera l l work flow a rch i te cture for c la ss i fi cation and data a na lys i s

Document Exemplars

Critical Attributes

Apply Classifica-tion

Decision

Process Data

Apply Classification

Strategy

Controlled Vocabulary

Versions

Initiate Process

System Files

Duplicates

Taxonomies

Rules

Iterate Strategy

Data Quality

Quality Control

Database

Auto Extract Attributes

Machine Learn-

Rules

Image Processing

Manu-al

Reconcile to CMS

Taxonomies

Critical Attributes

PARAGON TOWERS 233 NEEDHAM STREET, NEWTON MA 02464 | p: 617 -658 -2030

WWW.HAYSTAC.COM

Page 4: UIG360 Brochure

PARAGON TOWERS 233 NEEDHAM STREET, NEWTON MA 02464 | p: 617 -658 -2030

WWW.HAYSTAC.COM

Deploys unsupervised machine learning methods to associate documents with user-selected business topics/categories

(themes)

Facilitates analytics by providing for each document its related topics and related documents and versions

Combines (clusters) documents that belong to the same business topic/category, and provides extractive summary for

a given cluster or a single document

Facilitates search by supporting facets, or business topics (themes), related categories, versions, document age, etc.,

and provides in context query completion based on generated key phrases for the entire topic/category

Intelligent ingestion supports single and batch documents upload. It provides view into the upload process. It suggests

possible categories (business topics) of the new document and identifies previous versions if they exist.

SECU RIT Y HOSTI NG CLOUD SE RV ICES ANAL YTIC S SER VICE S

UIG360°™ Core Capabilities

Effective Technology For Content Intelligence

Categorization Cross Reference and Correlation

Content Upload