end user informatics
TRANSCRIPT
![Page 1: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/1.jpg)
InformaticsAmbareesh Kulkarni
![Page 2: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/2.jpg)
Informatics defined
• Informatics is the application of technology to bring Data, People and Systems together
• Bioinformatics is very Complex representation of Simple data
• Cheminformatics is very Simple representation of Complex data
2
![Page 3: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/3.jpg)
Current State
![Page 4: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/4.jpg)
Problem Statement….
“There's too much data and it's duplicated hundreds of times. The mistake companies make is that they start from the data they have. They need to ask what data do their users need and what are the questions they are asking. Understand the questions, how they can be answered and what kind of data is needed.”
Quote by CIO of Major Corporation
![Page 5: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/5.jpg)
Integrated Solutions - Business Case:IDC White Paper
• Information Tasks– Email – 14.5 hours a week– Create documents – 13.3 hours a week– Search – 9.5 hours a week– Gather information for documents – 8.3 hours a
week– Find and organize documents – 6.8 hours a week
• Gartner: “Organizations spend an estimated $750 Billion annually seeking information necessary to do their job.”
![Page 6: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/6.jpg)
• Time Wasted (per year)– Reformat information - $57 million per
10,000 users– Not finding information - $53 million
per 10,000 users– Recreating content - $45 Million per
10,000 users
Data Integration- Business Case:IDC White Paper
![Page 7: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/7.jpg)
• Reduce development costs, cycle times– Increase employee efficiency
– Less time looking, more time doing
• Enhance communication– Capture and reuse knowledge
– Innovate better & faster
• Cost of not finding right information– Business – lost money, opportunities
Data Integration - Business Case: General ROI issues IDC White Paper
![Page 8: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/8.jpg)
Key Takeaways
• Data Integration is not easy and represents ~80% of effort for a typical data integration project.
• Incompatible data are the largest, most expensive, and time-consuming portion of IT projects.
• Most data is in an unstructured format (outlook, word, PDF, images etc.)
8
![Page 9: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/9.jpg)
Evolution of data integration technologies
![Page 10: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/10.jpg)
Evolution of Integration Architectures
Point to Point HUB + Spoke HUB + EII
![Page 11: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/11.jpg)
Defining EII, EAI, ETL Data Integration
EII EAI
Enterprise Information Integration Enterprise Application Integration
Reports from multiple apps/data sources
Transactions to multiple apps
e.g. Real-time access to product silos for customers, employees
e.g. Compound name change in one application propagated to other products
EII ETL
Real-time Batch
Extract, Transform, Reportin real-time
Extract, Transform, Load;later report on data warehouse
e.g. report data from operational applications
e.g. build duplicate reporting data mart and/or redesign data warehouse
![Page 12: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/12.jpg)
Enterprise Application Requirements
Tools vs. Development Platform
Tools
Development Platform
![Page 13: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/13.jpg)
What do end users really care about?
• The Internet has raised the bar for Informatics expectations
• Complex Query? Millions of Rows? Full table Scan?
• Users don’t really care. If they can view stock prices in real time, why not corporate data.
• In an ideal world, data analysis needs to be at speed of thought.
• Bigger, better, faster, cheaper
![Page 14: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/14.jpg)
Business users view
Data
Pipeline Pilot
Reports
![Page 15: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/15.jpg)
IT perspective
![Page 16: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/16.jpg)
Key Takeaways
• Provide an Integrated view of data across multiple systems; flat files, data warehouses , data marts.
• Avoid “boiling the ocean” Jump start data integration efforts with PP to quickly meet an important user requirement and then decide if the data should be persisted in a data warehouse or data mart.
16
Use Pipeline Pilot to:
![Page 17: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/17.jpg)
Action from InsightData is a New form of Energy
![Page 18: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/18.jpg)
Why is data integration so important?
18
• Data in any organization is distributed in various disconnected and disparate systems
• There is always a need to combine most current data with historical values
• The success of the internet has created data sources outside the internal network
• Data has informational value only when combined with other & related data
![Page 19: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/19.jpg)
WARNING SIGNS : Of Poor Data Integration
19
• Incomplete Data foundation• Inability to consolidate data
from multiple sources• No single version of the truth• Poor audit trail and data
lineage• Historical values not retained
in a data warehouse or data mart
• Lack of integrated 360 deg view
• High cost of maintaining “one-time” in-house code
• Inability to comply with regulatory requirements
Presentations or discussions that are prefaced with statements like “most of our analysis would have been accurate, except for the missing data from….” or“Due to discovery of data not included in the last analysis , we are reversing our decision to……”
![Page 20: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/20.jpg)
WARNING SIGNS : Of Poor Data Integration
20
• Incomplete Data foundation• Inability to consolidate data
from multiple sources• No single version of the truth• Poor audit trail and data
lineage• Historical values not retained
in a data warehouse or data mart
• Lack of integrated 360 deg view
• High cost of maintaining “one-time” in-house code
• Inability to comply with regulatory requirements
As a result of an out-of-order condition for a critical chemical, a scientist must expedite the order and pay a premium price.When the chemical arrives the scientist (or worse her boss) discovers that another division had excess quantity of the same chemical and was looking to sell it at a discount.
![Page 21: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/21.jpg)
WARNING SIGNS : Of Poor Data Integration
21
• Incomplete Data foundation• Inability to consolidate data
from multiple sources• No single version of the truth• Poor audit trail and data
lineage• Historical values not retained
in a data warehouse or data mart
• Lack of integrated 360 deg view
• High cost of maintaining “one-time” in-house code
• Inability to comply with regulatory requirements
Scientists argue about the fact that analysis results differ-even though the data came from the same operational data source
![Page 22: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/22.jpg)
WARNING SIGNS : Of Poor Data Integration
22
• Incomplete Data foundation• Inability to consolidate data
from multiple sources• No single version of the truth• Poor audit trail and data
lineage• Historical values not retained
in a data warehouse or data mart
• Lack of integrated 360 deg view
• High cost of maintaining “one-time” in-house code
• Inability to comply with regulatory requirements
A technician alerts his management team of scientists to a potential problem discovered while running a query against a database.The technician cannot, however, answer the follow-up question , ” How long has the problem existed?”
![Page 23: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/23.jpg)
WARNING SIGNS : Of Poor Data Integration
23
• Incomplete Data foundation• Inability to consolidate data
from multiple sources• No single version of the truth• Poor audit trail and data
lineage• Historical values not retained
in a data warehouse or data mart
• Lack of integrated 360 deg view
• High cost of maintaining “one-time” in-house code
• Inability to comply with regulatory requirements
A Scientist runs a report every week against a LIMS, however to see a period-to-period comparison, the scientist maintains a spreadsheet into which he creates a new column every week and enters the data manually
![Page 24: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/24.jpg)
WARNING SIGNS : Of Poor Data Integration
24
• Incomplete Data foundation• Inability to consolidate data
from multiple sources• No single version of the truth• Poor audit trail and data
lineage• Historical values not retained
in a data warehouse or data mart
• Lack of integrated 360 deg view
• High cost of maintaining “one-time” in-house code
• Inability to comply with regulatory requirements
A customer calls tech. support to enquire about a pending case. While the customer support engineer has access to the case details, has no information available on whether the customer is current on maintenance, how many end-users they are licensed for or what options the customer has purchased.
![Page 25: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/25.jpg)
WARNING SIGNS : Of Poor Data Integration
25
• Incomplete Data foundation• Inability to consolidate data
from multiple sources• No single version of the truth• Poor audit trail and data
lineage• Historical values not retained
in a data warehouse or data mart
• Lack of integrated 360 deg view
• High cost of maintaining “one-time” in-house code
• Inability to comply with regulatory requirements
Minor change-requests take weeks to be implemented, any modifications have to be thoroughly tested for accuracy and integrity,
![Page 26: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/26.jpg)
WARNING SIGNS : Of Poor Data Integration
26
• Incomplete Data foundation• Inability to consolidate data
from multiple sources• No single version of the truth• Poor audit trail and data
lineage• Historical values not retained
in a data warehouse or data mart
• Lack of integrated 360 deg view
• High cost of maintaining “one-time” in-house code
• Inability to comply with regulatory requirements
CEO and CFO are uncomfortable signing off on the quarterly numbers as there is no way to trace the numbers back to the source systems.
![Page 27: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/27.jpg)
Case Study (closer to home): Services Order Report
• Poor data quality• Redundant information• Duplicate entries• Hard to read• Huge amount of time required to clean it up
![Page 28: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/28.jpg)
Information-sensitivity
• Data Availability and Accessibility• Data Quality
– DQ = Completeness X Validity– E.g. Measure of Completeness = # of null values in a column– E.g. Measure of Validity = “ We have 4 regions, but there are 18
distinct values in the region column”– Pitfall: Don’t take accountability for DQ on the source system– Push accountability where it belongs, in the source system(s)
• Timeliness of Data, relevant to the questions being asked by the user
• SQL and programming accuracy
Information Quality is a Direct Function of……
![Page 29: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/29.jpg)
Case Study (closer to home): Internal Revenue Forecasting process
Orders QTD Pipeline Delivered Forecast
Run the Services Products and Orders report in RSVPP ……; Export out the results and filter for product services (Column AM) and sum the Total Sale Price USD column
Run the Services Opportunities report in SFDC;export out the result……
Assuming Access is up to date………..; export to Excel; filter by product services and sum USD Amount columnAssuming Access is up to date run the Total Forecast report;
Export to Access; …………
![Page 30: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/30.jpg)
Near real-time data access
![Page 31: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/31.jpg)
Extract, Transformation & Load=Push big data
• Batch extract from transaction systems• Bulk transformation• Push load into data warehouse
Extract Load
Transformation
Data Warehouse
Real Time
![Page 32: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/32.jpg)
32
Pipeline Pilot and Real time Data access
Data Access Data Adapters
Data Transformation Transform Calculate Security
Relational Flat Files ERPLegacy EJBXML
<XML>
Information Access Web Services ODBC JDBC
• Flexible Data Access capabilities• Single access point to data
• Consumer sees only the end result
• Shared platform service• Available to all technologies
• Reusable building blocks• Targeted to specific needs
• Reduces costs and time to market
• Supports incremental development
![Page 33: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/33.jpg)
Case Study: PI Historian
33
• PI Historian, product provided by OSI, captures data real-time from the research test rigs
• Data capture in PI is triggered by events• PP allows scientists to read the data from PI historian as it
becomes available and also combine it with other information (e.g. associate real-time test data with historical characteristics of a catalyst
![Page 34: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/34.jpg)
Data provisioning pros and cons
OLTP ReplicationData Marts
Enterprise Data Warehouse
Pipeline Pilot
Data QualityEase of enquirySystem PerformanceHistory
Scalability
Speed to information
![Page 35: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/35.jpg)
Data IntegrationTotal Cost of Ownership
Really Matters
![Page 36: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/36.jpg)
1 “Just give me a list of compounds from the database, sorted by compound name”
Evolution Of an Informatics System
![Page 37: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/37.jpg)
“We also need to see the related toxicology information and for the list to be grouped by compound”
12
Evolution Of an Informatics System
![Page 38: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/38.jpg)
“We’d like to get a list of some of the related compound information, too, grouped by the first letter of the compounds name.”
12
3
Evolution Of an Informatics System
![Page 39: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/39.jpg)
“Actually, we’d like to be able to produce a completely separate report for compound and related toxicology information .”
12
3
4
Evolution Of an Informatics System
![Page 40: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/40.jpg)
Evolution Of an Informatics System
“We don’t like running the reports manually. Can they be scheduled?”
12
3
4
5
![Page 41: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/41.jpg)
Evolution Of an Informatics System
“We have quite a few users using this system now and there’s some fairly sensitive data in there.”
12
3
5
6
4
![Page 42: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/42.jpg)
“We need to be able to drill down into more detail”
7
12
3
5
6
4
Evolution Of an Informatics System
![Page 43: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/43.jpg)
7
8
12
3
5
6
“We need to track which users have used what Protocols”
4
Evolution Of an Informatics System
![Page 44: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/44.jpg)
“We need to be able to easily search the information we need.”
9
6
8
4
7
12
3
5
Evolution Of an Informatics System
![Page 45: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/45.jpg)
Evolution Of an Informatics System
9
6
8
4
7
12
3
5
“We need these reports linked to our business process”
“We need to be able to approve or reject the reports”
“We need a single version of the truth”
“We don’t want to be waiting around for the results”
“We don’t want to be re-typing information from these reports into our other application”
“We need to be able to see the underlying detail”
“We need to print the reports out to take into meetings”
“We need the output as Excel”
“We need charts”
“We need to know who’s looked at the reports”
“We need a simple way to see the entire contents of the report”
“We need a report that looks like an existing flow chart”
![Page 46: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/46.jpg)
Hidden Costs
• Organizations that believe that they can build a data integration solution at the fraction of cost of a COTS solution….
• Discover that any savings in up-front costs are very quickly incurred multiple times over the lifetime of the solution
• Typical effort to build a custom data integration solution can be upwards of 5000-5500 man days
• Some of the tasks that need to be undertaken to provide a functioning solution:
Application Architecture Data cleansing & enrichment services
Integration framework
User Interface design Common field matching Security
Batch processing capabilities
Application Integration Audit & Logging capabilities
![Page 47: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/47.jpg)
Build versus Buy Decision Criteria
47
Data Integration Considerations Build your own Buy
Initial Start-up cost Lower Higher
Ongoing Operating cost Higher Lower
Ongoing Support & Maintenance In-house responsibility Vendor
One time “quick and dirty” task Consider Maybe overkill unless one-time task becomes ongoing request
IT Staff requirements Higher Lower
IT Productivity Detracts from Contributes to
Data sources/data targets Single/single Multiple/multiple, Multiple/single, Single/multiple
Complex transformations Limited: IT must write complex code
Comprehensive
Integration Usually overlooked Industry standards
![Page 48: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/48.jpg)
Industry TrendsEnd-user Informatics
![Page 49: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/49.jpg)
Web 2.0What’s Setting Expectations Today
![Page 50: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/50.jpg)
Next-Generation Enabling Technologies & New User Demands Are Emerging
•Rich Internet Experience
•Web 2.0
•Portlet components
•XML and derivatives
•Dynamic, Ajax-based UI
•Rich Internet Experience
•Web 2.0
•Portlet components
•XML and derivatives
•Dynamic, Ajax-based UI
SOA Infrastructure
Leverage existing systems and components
Standardization
Data-driven environment
Open APIs to customize apps
SOA Infrastructure
Leverage existing systems and components
Standardization
Data-driven environment
Open APIs to customize apps
Personal Dashboards
Integrate data from multiple sources
Multi-account views
Cross-account planning
Personal Dashboards
Integrate data from multiple sources
Multi-account views
Cross-account planning
![Page 51: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/51.jpg)
Web 2.0 features on our projects
51
![Page 52: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/52.jpg)
Web 2.0 features on our projects
52
![Page 53: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/53.jpg)
Advanced Reporting/Visualization Collection
53
![Page 54: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/54.jpg)
Scientific Business Process Management and PP
54
• Fuse scientific and analytical data with process data• Use Pipeline Pilot in automated process decisions • Display reports and data at appropriate points in the
process• Use data to modify process execution
![Page 55: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/55.jpg)
Consolidated Informatics Platform
Consolidated Informatics PlatformConsolidated Informatics Platform
Many Databases Many Tools
Spreadsheets Analytics
Scorecards
Dashboards
Self- service Reports
Data Mining
Portals
Web Reports
Web Reports
Current
Future
Many Databases
![Page 56: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/56.jpg)
Key Takeaways
• Provide Accurate, Integrated & Seamless Informatics Solutions
• Reduce redundant and replicated data bases
• Rationalize existing Reporting tools and technologies
• Build Agile, Flexible and Reusable solutions
• Empower the end-users “Shift Right”
![Page 57: End User Informatics](https://reader035.vdocuments.us/reader035/viewer/2022062513/557ad033d8b42a2c0f8b501b/html5/thumbnails/57.jpg)
Shift Right