enterprise information mashups - wi-consortium.org jhingran.pdf · host to as400 communication s02...
TRANSCRIPT
IBM Confidential
Enterprise Information Mashups:Integrating Information, Simply
Anant JhingranCTO, Information ManagementIBM
Outline
Web 2.0 and Info 2.0Example and the research problems we seeIBM efforts in this areaCreativity v. Control
1964: S/360 debuts
1971: First Intel Micro
1981: IBM PC
1994: Netscape Navigator
2000: Dot-com collapse
Information Technology Spend “had” been growing nicely
Actual Application Architecture for Consumer Electronics Company
E01-EDI
Data Warehouse(Interfaces to and from the
Data Warehouse are notdisplayed on this diagram)
G02 - GeneralLedger
A05 - AP
S01 - SalesCorrections
I01 POReceiving
I03 Return toVendor
I06 WarehouseManagement
MainframePC/NT apps Unix apps3rd Party Interface
S06 - Credit App
P15 EES EmployeeChange Notice
OTHER APPS - PCAP - Collections/Credit
TM - Credit Card DB
ACCTS REC APPS - PC990CORBad Debt
Beneficial FeesBeneficial Reconcile
JEAXFJEBFAJEBKAJEDVAJESOAJEVSAJEVSFNSF
TeleCredit Fees
INVENTORY CONTROL APPS - PCCode Alarm
Debit ReceivingsDevo Sales
Display InventoryIn HomeJunkouts
Merchandise WithdrawalPromo CreditsRTV Accrual
ShrinkAP Research - Inv CntrlAP Research-Addl Rpts
Book to Perpetual InventoryClose Out Reporting
Computer Intelligence DataCount Corrections
Cross Ref for VCB DnldsDamage Write OffDebit Receivings
DFI Vendor DatabaseDisplay Inventory ReconcileDisplay Inventory Reporting
INVENTORY CONTROL APPS - PCDPI/CPI
IC BatchingInventory Adj/Count CorrectInventory Control Reports
Inventory LevelsInventory Roll
Merchandise WithdrawalOpen ReceivingsPI Count Results
PI Time Results from InvPrice Protection
Sales Flash ReportingShrink Reporting
SKU Gross MarginSKU Shrink Level Detail
USMVCB Downloads
Journal Entry Tool Kit
Scorecard - HR
L02-ResourceScheduling(Campbell)
P09 - P17Cyborg
M02 - Millennium
M03 - Millennium 3.0
Banks - ACH and Pos toPay
Cobra
B01 - StockStatus
S03-Polling
P14 On-line NewHire Entry
CTS
Plan Administrators(401K, PCS, Life,
Unicare, SolomonSmith Barney)
D01 Post LoadBilling
I04 HomeDeliveries
I02 -Transfers
Arthur Planning
I07 PurchaseOrder
I12 EntertainmentSoftware
I05Inventory Info
E13E3 Interface
S04 - Sales Posting
V01-Price ManagementSystem
I10 Cycle PhysicalInventory
I55 SKUInformation
K02Customer Repair
Tracking I35 Early WarningSystem
B02 MerchandiseAnalysis
I13- AutoReplenishment
U18 - CTO
Intercept
I09 Cycle Counts
E02-EmployeePurchase
Texlon 3.5
ACH
Stock Options
I17 Customer PerceivedIn-Stock
U16-Texlon
SiteSeer
C02 - CapitalProjects
F06 - FixedAssets
US Bank ReconFile
Star Repair
EDICoordinator
Mesa Data
NEW SoundscanNPD Group
AIG Warranty Guard
Resumix
Optika
Store BudgetReporting
P16 - Tally Sheet
Cash Receipts/Credit
S05 - HouseCharges
Ad Expense
L01-PromoAnalysis
V02-PriceMarketingSupport
BMP - Busperformance Mngt
StoreScorecard
I11 PriceTesting
Valley Media
P09Bonus/HR
I15 Hand ScanApps
Roadshow
POS
S08 - VertexSalesTax
A04 - CustRefund Chks
Equifax
ICMS Credit
CellularRollover
S09 - DigitalSatelliteSystem
NPD,SoundScan
Sterling VANMailbox (Value)
I18SKU Rep
X92-X96Host to AS400
Communication
S02 -Layaways
Washington,RGIS,
Ntl Bus Systems
V04-SignSystem
I14 Count CorrectionsNARM
P01-EmployeeMasterfile
I06 - CustomerOrder
FrickCo
UAR - Universal AccountReconciliation
DepositoryBanks
S07 - CellPhones
S11 - ISPTracking
AAS
Fringe PO
Cash Over/Short
L60 MDFCoop SKU Selection
Tool
SKUPerformance
SupplierCompliance
1
I35 - CEIASIS
Misc Accounting/Finance Apps - PC/NTCOBA (Corp office Budget Assistant)
PCBS(Profit Center Budget System)Merchandising Budget
AIMSMerch Mngr Approval
Batch ForcastingAd Measurement
AIMS Admin
AIMSReportingAd
Launcher
V03- MktReactions
SpecSource
CTO2.Bestbuy.com
RebateTransfer
SignSystem
CopyWriter'sWorkspace
ELTPowerSuite
StoreMonitor
AIS Calendar
Stores & Mrkts
Due Dates
Smart Plus
InsertionsOrders
BudgetAnalysis Tool
Print CostingInvoice App
AIS Reports
BroadcastFilter
Smart PlusLauncher
GeneralMaintenance
Printer PO
PrinterMaintenance
VendorMaintenance
Vendor Setup
Connect 3
Connect 3Reports
Connect 3PDF Transfe
Spec SourceSKU Tracking
S20-SalesPolling
Prodigy
PSP
In-HomeRepair
WarrantyBillingSystem
Process Servers(Imaging)
Prepared by Michelle Mills
Over time, complexity got built into the IT systems
Presentation Services
EDW
Legacy LegacyPortals, Browsers, and or Devices
StrategicAPPL
EventProcessing
TacticalAPPL
TxAPPL
AppServer
DiscoveryAPPL
MasterDataAPPLProcess
Services
Information Integration Services Analytic Services
Master Data Services
Transaction Application Services Analytic Application Services
Business Process Management
Federation
Discovery Services
ECW
Content ServicesCollaboration Services
Notes
Enterprise Service Bus
Metadata Services
Master data Hubs
Product Customer
Supplier Location
Transaction Services
OLTP2OLTP1OLTP
BusinessRules
BusinessMonitoring
StreamingBatch
Metadata
And using Information as a Strategic Asset to build better Architectures
Open Innovation is Here to Stay, Exemplified by Web 2.0
But…
Web 2.0 outside, and inside an enterprise will succeed only with a Info 2.0 Mashup Fabric
Info 2.0
Enables the same separation of “data” and “logic” that revolutionizedthe use of databases in the ’80’s.
Web 2.0
Enables the same separation of “information” and “process” that is now
happening in Web 1.5
Within enterprises, it will…
Enable connections to information that does not make it into the enterprise IT Architectures:– Email– Presentations and Documents– External Data (Web)– Spreadsheets– Decision Support Datasets…
And Enable it to be done “quickly”, as “assembly” as opposed to as “programming”
EDW
Legacy LegacyPortals, Browsers, and or Devices
StrategicAPPL
EventProcessing
TacticalAPPL
TxAPPL
AppServer
DiscoveryAPPL
MasterDataAPPLProcess
Services
Information Integration Services Analytic Services
Master Data Services
Transaction Application Services Analytic Application Services
Business Process Management
Presentation Services
Federation
Discovery Services
ECW
Content ServicesCollaboration Services
Notes
Enterprise Service Bus
Metadata Services
Master data Hubs
Product
Customer
Supplier
Location
Transaction Services
OLTP2OLTP1
OLTP
BusinessRules
BusinessMonitoring
StreamingBatch
Metadata
doc CMDB Filesdocemail
How the Architecture could play out…
ppt
IT FocusLOB Focus
Web 2.0
Info 2.0
SaaS Model Software Model
Info 2.0 FabricInfo 2.0 Fabric
Situational AppsSituational Apps Process Server/ESBProcess Server/ESB
Information IntegrationInformation Integration
External Web
http://water.usgs.gov/waterwatch/
(Zipcode)
edc.usgs.gov/
Example
(Geocode = Latitude/Longitude) (Geocode = Latitude/Longitude)
http://www.dotd.louisiana.gov/
(HUC = Hydrological Unit Code)
http://florida.maps.anant/
Meet Pete, an insurance agent in Florida.
He sees a news report of a severe storm. What is the company’s risk?
He needs to forward a risk summary to executives.
Flood Risk Assessment Mashup
Mashup SearchMashup Search
ReportReport
StandardizeStandardize www.floodlevels.comwww.floodlevels.com
standardize
policy XLSpolicy XLS water.usgs.govwater.usgs.gov edc.usgs.govedc.usgs.gov dotd.florida.govdotd.florida.gov
Screen ScrapingScreen Scraping
LineageLineage
StandardizationStandardization
So how can Pete write his mashup simply?
Simplicity
Acc
urac
y
Simplicity
Acc
urac
y
Procedural Code
ProceduralCode
<?php// Get policy holders in a Policy object array$url = "file://policies/myclients.xsl";$content = file_get_contents($url);$policyArr = getPolicy($content);
// Find high risk zones$url = "http://www.floodlevels.com";$content = file_get_contents($url);
// Do screen scraping to extract high risk zones$zoneArr = findRiskyZones($content);
// Initialize the return array$riskArr = array();
// Find corresponding policy holders for each cityforeach ($policyArr as $policy){
if ($policy->amount < 250000){
continue;}
// Standardize the address$policyZone = findZone($policy->address);
// Check whether this policy affectedforeach ($zoneArr as $zone){
if ($zone == $policyZone){
// This policy carries a high risk. // Insert into high risk array$riskArr [] = $policy;
}}
}
// Send email to manager for high risk policiessendEmail("[email protected]", "High risk policies",
$riskArr);?>
So how can Pete write his mashup simply?
So how can Pete write his mashup simply?
Simplicity
Acc
urac
y
sendMail("[email protected]",<highRiskPolicies> {
for $i in url(“file://policies/myclients.xsl”)for $j in url("http://www.floodlevels.com”)where $i//amount > 250000 and
$i//address in $j/zone return <policy> {$i} </policy>
}</highRiskPolicies>);
Declarative Queries
So how can Pete write his mashup simply?
Simplicity
Acc
urac
y
GUIs, Spreadsheets, Wikis
So how can Pete write his mashup simply?
Simplicity
Acc
urac
y
Flood risk for homes in myclients.xsl worth over 250000
Search
How do we get there?
Research Agenda
It is all about “simplicity” – do deep research and build deep technology, but make the job of application writer much easier!
Much of our past research is applicable (including Information Manifold and its children), but new problems exist because of new target users.
Info 2.0 Mashup Fabric needs to address these issues, over timeHow to create such a Mashup?
– Finding what exists, specifying what he wants, and creating what is needed (expressiveness vs. ease of use – DWIS vs. DWIM)
How to integrate the information? – What is the minimal level of semantics that the
Information 2.0 layer needs to have, and has the world evolved to make it easier now?
How to deal with unstructured data?
How do Mashups evolve?
How does Pete find the floodlevels.com Mashup?
Pages on floodlevels.com are dynamically generated AJAX pages (produced by another mashup)
Pete may have typed “Flood Levels Louisiana” into a search engine
Similar to deep Web search problem, but now we have to deal with joins and other mashup operations, or even workflow
Search has to understand the logic of the mashup
Web 2.0 magnifies the deep web search problem
How does Pete specify his Mashup?
Pete is an insurance agent, not an expert Javascriptor PHP/Java/Ruby/etc. programmer
How does Pete specify a screen scraper if needed?
How does Pete describe the Mashup flow?– Current mashups are a hodge-podge of application
and data access– Similarity to ETL Flow– Is the answer an XQuery-like language for mashups,
or programming by example?
Web 2.0 needs simple methods to write mashups!
Can he create the Mashup by giving an example?
Could it have been even easier?
Could Pete’s mashup have been dynamically constructed when he searched for “flood levels for zipcodes 33101, 34106, etc.”?
– Test of Time Award: “Information Manifold”Querying Heterogeneous Information Sources Using Source Descriptions by A. Halevy, A. Rajamaran, and A. Ordille
– automatically finding the right sources based on query
Extend Information Manifold to dynamically create Mashups!
How does one simplify “semantics”?
Helped by:– Microformats growing in popularity in the open
community– Standardization services increasingly available– Master Data Management taking off in enterprises
Issues:– Standardization is inherently uncertain. How is
uncertainty handled?– Quality of services differ. How to track the lineage of
both data and integration services? – Services vary in price. How to trade-off price, quality,
and time?
Search shows us some ways
Issues in Unstructured Data
Everybody wants to run analytics on unstructured data, and create structured data, and then we are back in our favorite world. This poses two challenges:– Analytics are hard and require some fundamentally
new techniques.– The extracted structured (meta-) data is inherently imprecise.
But unstructured query systems have evolved to address this!
DATA
QUERY/INTEGRATION
S U
U
S Analytics
Semantics
In another Web 2.0 sense, how does this co-exist and augment social tagging?Manual tagging –
By Professionals
CostlyHuman resource intensiveCannot keep up
Controlled vocabularies & standard taxonomiesHigher quality
Example: ?
ConsPros
Social Tagging –By Users
AmbiguityUncontrolled vocabularySynonyms
User drivenEmergent folksonomiesSerpendipitous browsing
Examples: Del.icio.us and Flickr
ConsPros
Automated Tagging –By Machine
Requires training of modelsLower quality than manual tagging
Learns from professional & user taggingLower human cost
Example: Semantic tagging
ConsPros
Popularity
Digital item
Consumer contentDeep archives, large personal collections
High-value content & enterprise data sources
“Long tail”
Mashup Evolution
MissionCritical
BestEffort,AdHoc
Limited Time, Immediate Lots of Time
Mashups
SCAPortals
New InitiativesProof of Concept
Line of Business
IT Dept
DataMartDataWarehouse
Mashup Starter Kit – A Mashup Fabric for Intranet Applications being built @ IBM
XML/Atom/RSS Feed HTML
Web ServicesWeb Services
Web PagesWeb PagesXML/Atom/RSS FeedsXML/Atom/RSS Feeds
Atom/RSS StoreAtom/RSS Store
MAFIAMAFIA
PresentationPresentation
IngestionIngestion
AugmentationAugmentationFusionUnion
Standardization
TransformationFeed Generation
Screen ScrapingWeb Services
Lightweight Semantics
Enterprise IT
Services
External Data
Services
As my Mom Used to say (perhaps still says!)
“How can you have any pudding if you don’t eat clean your feet?”
(Apologies to Pink Floyd, “The Wall”)
How do we unleash creativity, yet keep light control?
Transform
Create and Explore
Assemble & Use
Unleash this
Manage this
WebContent
DepartmentalContent
PersonalAssets
EnterpriseInformation
Summary
Web style of architectures represent the next “sustainable”phase of IT spend
The database research community can make a big difference! – Re-enable the separation of data and logic: Web 2.0 built on Info
2.0!
New research problems exist– Ease of use and ad-hoc integration.– Bringing Unstructured and (semi-) structured data
We at IBM are building such an Info 2.0 Fabric, targeting enterprise situational applications
One of the biggest battles will be creativity vs. control