draft positioning data discovery for greater impact october 2014

DRAFT

Positioning Data Discovery for Greater Impact

October 2014

DRAFT

Agenda

Department of Public Welfare Data Analytics Landscape

Positioning Endeca

Enablement Highlights and Outcomes

Future Roadmap

Questions & Discussion

Landscape

DRAFT

Department of Public Welfare

DRAFT

EDW - Landscape

Technology: Cognos 10.2, Informatica 9, Oracle 11G

• Office Income Maintenance (OIM)

• Pennsylvania Insurance Department (PID)

EDW

DW

• Pennsylvania Department of Education (PDE)

DW

• Office of Medical Assistance Program (OMAP)

• Office of Children, Youth and Families (OCYF)

• Office of Children, Youth and Families (OCYF)

• Office of Child Development & Early Learning (OCDEL)

• Office of Developmental Programs (ODP)• Office of Long Term Living (OLTL)• Office of Medical Assistance Program (OMAP)• Office of Mental Health and Substance Abuse

Services (OMHSAS)

Technology: Cognos 7, Decision Stream, Oracle 10G

Technology: Cognos, Informatica, Oracle 10G

PIMS Bridge

(OCDEL-PDE)

Enterprise Incident Management• ODP• OLTL

DRAFT

Investment in Information Management

Stage 2:

Stage 3:

Stage 4:

Stage 1:

What might happen?

StaticReporting

Business IntelligenceAnalytics

Advanced AnalyticsData Gathering

Str

ateg

ic i

mp

act

What is available?

Pre-defined Reporting: • Prompt reports• Scheduled

reporting

Ad hoc capabilities:• Self service reports• OLAP cubes

Monitoring KPIs:• Dashboards• Scorecards

Predictive Analytics:•Incident prediction•Financial forecasting•Service effectiveness• Fraud detection and prevention

Mobile Analytics:•Alerts•On-the-go Metrics

Why is it happening?

What is happening?

Business Analytics Capabilities

Positioning Data Discovery

DRAFT

Key Drivers for Data Discovery

Challenges Details

Data Tsunami & Unpredictability Critical data is being collected at an unprecedented scale from varied sources driving

up analytics complexities Data volumes and integration efforts are roadblocks to insights

Value Proposition Value of data explodes when it is linked with other data for correlations Collecting detailed recipient service measures ensures quantifying program impact

Greater Community Impact

Goal being to provide a better quality of life to each person in a shorten timeframe. Help propagate the design of high impact programs for current and future recipients Drive actions for prevention of child abuse 360-degree view of clients, families and providers to supplement the department's

mission

Positioning Endeca

DRAFT

Oracle Endeca

Consulting

• In-memory architecture and innovative caching deliver extreme performance

• Powerful text analytics extracts key themes and sentiments

• Support for sentiment analysis in 10 languages, localization in 13, and search and self-service term extraction in 33+ enable truly global analytics

• Sophisticated data integration and ETL streamline access to enterprise sources, including Oracle Business Intelligence

• Agile, data-driven approach requires no up-front modeling, for fast time to value

Deep Text Analysis

Enterprise Data

Discovery

In Memory Analytics

Robust Data Integration

Oracle Endeca

Self Service Discovery

• Easily create, configure, and securely share discovery applications within the context of enterprise governance and security

• Upload information from a wide array of self service sources including Excel, JSON, and any data source accessible via JDBC

• State-of-the-art search and guided navigation surface insights with a click

• Live data enrichment allows users to enhance analytics in the moment

Endeca is a complete solution for agile data discovery across the enterprise, empowering business user independence in balance with IT governance. Endeca offers fast, intuitive access to both traditional analytic data and non-traditional data, including external and unstructured information.

DRAFTFragmented Source Compilation

Web Sentiment Analysis

Self-Service Enablement

Transactional/Stage Data Discovery

Unstructured/Semi-Structured Data Analysis

Endeca allows for the ingestion of unstructured and semi-structured data and provides analytics capabilities to uncover hidden trends and details

Endeca allows for applications to be created directly on source and stage data which help Program Office Business Analyst’s slice and dice information to uncover previously un-realized questions to complement enterprise reporting requirements

Endeca allows for rapidly assimilation of data from multiple sources to garner an executive view of the data from across multiple data stores

Its capability for Program Office Business Analyst’s to upload diverse data for snapshot analysis with minimal dependence on IT for basic data setup and support

Ability to setup web crawls for gathering data and provision online sentiment analysis which could potentially lead to drawing correlations with enterprise data

Value Proposition & Applications

DRAFT

Perceived Benefits

1. Fragmented Source Compilation

• Combining EDW, OCYF, and CY48 data allowed program offices to drill into causes for heightened days for investigation and expose potential reasons for bottlenecks

• OIM compilation of demographics, census, and CQCCOM service information helped draw a holistic view of the recipients

2. Advanced Analytics

• Sentiment analysis of structured and un-structured data which includes whitelist tagging and text extractions, alongside spreadsheet consumption and visualization

• Built-in mapping and advanced visualization engines like tag clouds and capabilities for negative refinements

3. Data Validations

• Provisioning access to view data captured by OCYF enabled a window into potential future enterprise reporting needs

• Access to previously unavailable SAMS, eCIS, and HCSIS transaction data

4. Delivery Cycles

• Typical delivery cycle for an Endeca project is 8-12 weeks with a 16-20 week update cycle based on end-user feedback for required enhancements

• 2-4 week cycles for applications built using self-service for a quick window into the data

Enablement

DRAFT

DPW Enablement

Objectives

The objectives being targeted with the initial 25 user enablement:

Enterprise-wide Adoption• Uncover the potential landscape for the application of Endeca within the department• Determine use and adoption of Endeca and the concept of data discovery across program offices

Concept Positioning• Build the utilization of the complete set of Endeca’s standard capabilities • Blend its use within the existing Business Intelligence/Data Analytics Landscape

Solution Scalability• Determine factors to be considered during deployment within the Enterprise for a significant user-base• Document governance for People, Process, and Technology considerations encompassing rollouts

DRAFT

FY 2013 – Dec

FY 2014 – Jan

FY 2014 – Feb

FY 2014 – Mar

FY 2014 – Apr

FY 2014 – May

FY 2014 – Jun

FY 2014 – Jul

FY 2014 – Aug

FY 2014 – Sept

FY 2014 – Oct

Basic Install, capability demonstration and Self-Service Enablement

Wave 1 (Initiation)

Configurations, assessment and initiate attempts to build end-user

content for program offices

Wave 2

Gain targeted adoption and consensus for an enterprise rollout

Wave 3

Imp

lem

enta

tio

n T

imel

ine

Basic Install, capability demonstration and Self-Service Enablement

Wave 1 (Initiation)

Configurations, assessments and initiate attempts to build end-user content for

program offices

Wave 2

Gain targeted adoption and consensus for an enterprise rollout

Wave 3

Wave 1 Lessons Learned

Phase 1 Phase 2

Executive Touch Points

Timelines & Targets

DRAFT

15

The development of the self service applications for the program areas resulted in common themes across the program offices.

Findings – Data/Application Rendition

• Allows for rendering previously unavailable data for mining and analysis

• Provides access to unstructured and fragmented data

• Allows for the ability to include traditional and non-traditional sources

• Gaps and limitations that warrants governance through maintenance cycles

Benefits • Exposed fraudulent activity to drive cost savings• Exposed issues with data quality and

corresponding business analysis implications• Showed previously unknown information and

sentiments captured within comments• Shortened build cycles of 2-4 weeks for

demos/POCs• Accelerated end user delivery of feedback and

enhancements• Ability to decide if POC should be developed into

ongoing report• 8-12 week production application delivery

alongside total 16-20 week window for incorporating end-user driven enhancements

Key Findings and Benefits

DRAFT

Advanced visualizations like geo-spatial maps allowed for a simplified user-experience in uncovering insights

Advanced Visualization & Data Mashup

Data Mashups allowed for merging and drawing comparisons across internal & external data sources

DRAFT

Negative Refinement

Review of SNAP transactions for the month. Appears most transactions occur within our state.What happens if we remove PA and border states?

DRAFT

Negative Refinement (cont.)

Information appears that we may not have known. We see transactions occurring outside of PA and bordering states.Opportunity for further evaluation and discovery on that information.

DRAFT

Capabilities for Tag Cloud highlights and Summarizations drive Advanced Analytics

Advanced Analytics

Ability to house vast amounts of data within domains propagated “big data” mining and exploration

Ability to quickly perform a ‘negative’ refinement. Remove the big number to see what remains and may discover new unknowns.

DRAFT

Big Data Mining

Ability to house roughly 100 million records within a single domain provisioned capabilities to mine otherwise unusable data resulting in fraud prevention and summarized reporting

DRAFT

21

While the current applications were created by IT, there is an ability to transition development to program office users based on the vision of the rollout.

Future Vision

End-User/Program Office Driven Self-Service

50%

-

50%

BIS/ IT Driven Self-Service

Program Office/ End-User Driven Self-Service (10% Utilization)

• Technical/Super users within program office currently driven to utilize capability

• Limited time/effort availability and tool or conceptual knowledge gaps • Challenges with utilizing self-service capabilities

Considerations to increment adoption: Endeca training (Train the trainer), Identify program office FTEs developers, Re-use content across applications

IT Supported Self-Service (90% Utilization)

• Conducted initial conversations with program offices for insights into challenges with data availability and analysis

• Built out drafts to highlight possibilities leveraging Endeca• Follow-up sessions with program office stakeholders to finalize application

layouts and drive long-term value• Governance for environment stability and functionality deliverance

Considerations for decreasing involvement: Involvement just with alleviating roadblocks, Augmenting re-usable content (e.g. Blacklists)

Current State & Future Vision

DRAFT

22

Long Term Concept Positioning

IT/BIS Supported

Self-Service

Program Office/ End-User

Driven Self-Service

Future

Governed use of Self-Service for snapshot

analysis

Automated Endeca Production Applications for

Regular Use

Uncover use-cases/KPIs for Enterprise Reporting

through Cognos

DRAFT

Collaborative Project Delivery

Future Roadmap

DRAFT

25

Future High Level roadmap

2. Expand Deployment to Program Areas for Self Service Apps

Today Tomorrow

3. Deployed to All Program Offices;

100+ Users

4. Scale Users & Data Volumes; Expand Self Service Apps

Future

4a. IT Provisioned Applications to Program Offices

1. Production Pilot

25 NUP

5. Enterprise Wide Adoption

DRAFT

26

Current Configuration

Current (Test & Development) Configuration

Test/Dev Configuration

Studio Server Endeca Server

Integration Suite Server + Text Enrichment & Sentiment Analysis

User count- Up to 25 users

Server Configuration- Up to 4 cores- 8 GB RAM minimum16GB+ recommended

Server Configuration- Up to 8 cores- 64 GB RAM minimum128GB+ recommended


Server List

DRAFT

Enterprise Configuration

Server List

Studio Server Endeca Server

Integrator Server

OVM


Exalytics Hardware Platform- 40 total cores

-Hard partitioning allows you to only license what you need

- 2 TB of RAM- 2.4 TB of Flash Disk

Server Configuration-Up to 8 cores- 64GB RAM


Server Configuration-Up to 8 cores- 128 GB RAM

Server Configuration-Up to 24 cores- up to 1.856 TB RAM

Option 1

Perceived Future State

DRAFT

Estimated Sizing

Server ListPros• Improved end user experience and productivity• Efficiently leverage the power of Exalytics by licensing

100% of the server

Cons• No room on Exalytics for future growth• Single points of failure at the Studio / Endeca Server

tiers

Potential Outcomes

DRAFT

Integration Suite Server + Text Enrichment & Sentiment Analysis

Endeca Server Cluster Node 1

Studio Server

User count- up to 150 Users

User count- up to 150 Users

Server Configuration- 4-8 cores- 8 GB RAM minimum64GB+ recommended

Configuration- Up to 24 cores- Up to 2 TB of RAM- 2.4 TB of Flash Disk

OVM

Endeca Server Cluster Node 2

Perceived Future State

Option 2

DRAFTEstimated Sizing

Pros• Clustered design removes single points of failure• Enable Consistent, Stable, & Scalable Application • Room to Grow on each server, supporting Future Growth• Greater user adoption and experience • High Availability for Business Continuity

Cons• Clustered design makes CPU studio pricing for unlimited

users less attractive

Enterprise Configuration

Potential Outcomes

DRAFT

Questions?

draft positioning data discovery for greater impact october 2014

Documents

datadriven approach

data source accessible

prevention mobile analytics

enterprise sources

self service reportsolap

varied sources

integration efforts

information managementstage