data warehouses and nosql -...
TRANSCRIPT
DataWarehousesandNoSQLSharingAdministra6veInforma6on
CarmenBarandelaSo-wareEngineerCERN/GS‐AIS
October24‐28,2011JINR/CERNGridandManagementInforma6onSystems
Agenda• DataWarehousesinAdministra6veCompu6ng
• Recap:DataWarehousesTheory
• DataWarehousesandInforma6onSystemsinAIS– Founda6on,HRandFIInforma6onSystems
– ComplexDataExtrac6onProcesses– Pixel‐PerfectRepor6ng– Dashboards
• NoSQL– ...
CarmenBarandela:"DataWarehousesandNoSQL” 2
Agenda• DataWarehousesinAdministra6veCompu6ng
• Recap:DataWarehousesTheory
• DataWarehousesandInforma6onSystemsinAIS– Founda6on,HRandFIInforma6onSystems
– ComplexDataExtrac6onProcesses– Pixel‐PerfectRepor6ng– Dashboards
• NoSQL– ...
3CarmenBarandela:"DataWarehousesandNoSQL”
Ca.16,000People
4CarmenBarandela:"DataWarehousesandNoSQL”
Mankind’sLargestMachine
5CarmenBarandela:"DataWarehousesandNoSQL”
EnormousAmountofData
JanJanke:"DataWarehousesandAnaly6calDataProcessing..." 6
Administra6veCompu6ng• ProvidesmeanstoadministrateCERN• Enablesphysiciststofocusontheirwork• Allowsmanagementtomaketherightmoves
7CarmenBarandela:"DataWarehousesandNoSQL”
WhyDataWarehouses?• Heterogeneouscompu6nglandscape• VariousspecialisedOLTPsystems• Planningneeds• LegalRequirements
8
Supportadministra6vestaff Enforcesecurityandsafetyonsite Allowmanagementtomakedecisions
CarmenBarandela:"DataWarehousesandNoSQL”
Example:KeepFinancesUnderControl
• SpecialisedSystems– Accoun6ng,ERPforCERNstores– Externalcontractsmanagement– Payroll,treasurymanagement,…
9
Specialisedsmallusergroups
Dis6nctdatabases
Highavailabilityandperformance,real‐6medata
Systemsonlyaccessibletoauthorisedspecialists
CarmenBarandela:"DataWarehousesandNoSQL”
Example:KeepFinancesUnderControl
• GeneralFinancialInforma6onSystem– Singlesystem
– Accesstodatafrommul6plesources– Differentlevelsofcomplexity
10
Specialisedsmallusergroups
Dis6nctdatabases
Highavailabilityandperformance,real‐6medata
Systemsonlyaccessibletoauthorisedspecialists
CarmenBarandela:"DataWarehousesandNoSQL”
Example:KeepFinancesUnderControl
• GeneralFinancialInforma6onSystem– Singlesystem
– Accesstodatafrommul6plesources– Differentlevelsofcomplexity
11
UsersfromallareasofCERN
Singledatawarehouse
Highavailabilityandperformance,butnonecessityforreal‐6medata
Securityisextremelyimportant!SystemisaccessibleCERNwide.
CarmenBarandela:"DataWarehousesandNoSQL”
AIS’FinancialDataWarehouse
• Keepdatainsyncwithdataproviders• Mastercomplexdataextrac6onprocess
• Ensurehighqueryperformance
• Basefordetaileddataanalysis
12
Technologies:o ORACLERACdatabaseo JavaEnterprisewebapplica6onso In‐housedevelopedframeworkso Third‐partyBIandrepor6ngtools
CarmenBarandela:"DataWarehousesandNoSQL”
Agenda• DataWarehousesinAdministra6veCompu6ng
• Recap:DataWarehousesTheory
• DataWarehousesandInforma6onSystemsinAIS– Founda6on,HRandFIInforma6onSystems
– ComplexDataExtrac6onProcesses– Pixel‐PerfectRepor6ng– Dashboards
• DetailedDataWarehouseExample– ManagementDataLayer(MDL)
13CarmenBarandela:"DataWarehousesandNoSQL”
FindtheNeedleintheHay…
14CarmenBarandela:"DataWarehousesandNoSQL”
OLTPvsOLAP
OLTP OLAP
Datasource Opera6ons OLTP(consolidated)
Datapurpose Runthebusiness Repor6ng,analysis
Inserts,updates High Periodicbatchjobs
Querycomplexity Low High
DBdesign Normalized Star,snowflake
Availability Cri6cal Lesscri6cal
Target Opera6onalstaff Middle/higherMgmt.
15CarmenBarandela:"DataWarehousesandNoSQL”
OLTPvsOLAP
OLTP OLAP
Datasource Opera6ons OLTP(consolidated)
Datapurpose Runthebusiness Repor6ng,analysis
Inserts,updates High Periodicbatchjobs
Querycomplexity Low Depends…
DBdesign Normalized Snowflakeandothers
Availability Cri6cal MaybeverycriOcal
Target Opera6onalstaff Mgmt.+OperaOons
16
That’stheory!
Realworldisnotthat
easy…
CarmenBarandela:"DataWarehousesandNoSQL”
Normalisa6on(Codd/Boyce)
• 1NF– 1table=1rela6on,norepea6nggroupsorduplicaterows
• 2NF– Allnonprimeanributesdependonallparts(anributes)ofacompositekey
• 3NF– Allnonprimeanributesdependonlyonthe(whole)key
17CarmenBarandela:"DataWarehousesandNoSQL”
Normalisa6on(Codd/Boyce)
• 1NF– 1table=1rela6on,norepea6nggroupsorduplicaterows
• 2NF– Allnonprimeanributesdependonallparts(anributes)ofacompositekey
• 3NF– Allnonprimeanributesdependonlyonthe(whole)key)
18
Course Category Winner Origin
Monaco‘10 Formula1 M.Webber Australia
Japan‘10 Formula1 S.Venel Germany
Japan‘10 Rally S.Ogier France
Notin3NF,why?
CarmenBarandela:"DataWarehousesandNoSQL”
StarSchema
19
Source:hnp://www.execu6onmih.com/data‐warehouse/star‐snowflake‐schema.php(16/10/2010)
item_key
branch_key
loca6on_key
units_sold
dollars_sold
avg_sales
Measures
branch_keybranch_namebranch_type
Branch
SalesFactTable item_keyitem_namebrandtypesupplier_type
item
loca6onloca6on_keyStreetcitystate_or_provincecountry
CarmenBarandela:"DataWarehousesandNoSQL”
SnowflakeSchema
20
Source:hnp://www.execu6onmih.com/data‐warehouse/star‐snowflake‐schema.php(16/10/2010)
city_keycitystate_or_provincecountry
city
item_key
branch_key
loca6on_key
units_sold
dollars_sold
avg_sales
Measures
branch_keybranch_namebranch_type
Branch
SalesFactTableitem_keyitem_namebrandtypesupplier_key
item
loca6on
loca6on_keystreetcity_key
supplier_keySupplier_type
supplier
CarmenBarandela:"DataWarehousesandNoSQL”
FromOpera6onstoRepor6ng
21
Source:hnp://www.deakin.edu.au/ddw/what‐is.php(16/10/2010)
ERP
FI
HR
…
CarmenBarandela:"DataWarehousesandNoSQL”
Analysis
• DataMining• Drilldown– Finerdetailgranularity(e.g.addagroup‐bycolumn)
• Slice&dice– Playwiththedimensions• Combinedifferentdimensions• Remove/addadimension
• Analysefactchanges
22CarmenBarandela:"DataWarehousesandNoSQL”
Agenda• DataWarehousesinAdministra6veCompu6ng
• Recap:DataWarehousesTheory
• DataWarehousesandInforma6onSystemsinAIS– Founda6on,HRandFIInforma6onSystems
– ComplexDataExtrac6onProcesses– Pixel‐PerfectRepor6ng– Dashboards
• NoSQL– ...
23CarmenBarandela:"DataWarehousesandNoSQL”
CERN/AISBusinessMap
24CarmenBarandela:"DataWarehousesandNoSQL”
Founda6on• CommondatalayerforvariousAISservices• DatainterfacesforotherCERNservices• Commonapplica6ons(e.g.mgmt.ofroles)
25
HRInformaOonSystem(HRT)
FIInformaOonSystem(CET)
…moredomainspecificinformaOonsystems
OperaOvesystems
CarmenBarandela:"DataWarehousesandNoSQL”
VariousSpecialisedSystems
• ORACLEHR• CERNTrainingApplica6on• Safety&accesssystems• EDH(ElectronicDocumentHandling)• Accoun6ngApplica6on• ERPsystemforCERNstores• Contractfollow‐up• …
26CarmenBarandela:"DataWarehousesandNoSQL”
TechnicalEnvironment
• Sourcedatabases:– ORACLE10/11g– MicrosorExcel
• HR/FIInforma6onSystems:– ORACLE10/11g– JavaEnterprisewebapplica6ons– SAPBusinessObjectstoolfamily
27CarmenBarandela:"DataWarehousesandNoSQL”
DataExtrac6ons
• Nightlyscheduledbatchjobs• Extrac6onsorganisedinSQLscripts• Runbyself‐developed“batchrunner”– Controls• Orderofexecu6on(sequen6al,parallel)• Cri6cality• Logging• Problemescala6on(automa6cemails)
28CarmenBarandela:"DataWarehousesandNoSQL”
Defini6onofExtrac6onProcess(1)
29
Generaldefini6ons
CarmenBarandela:"DataWarehousesandNoSQL”
Defini6onofExtrac6onProcess(2)
30
Batches&commands
CarmenBarandela:"DataWarehousesandNoSQL”
ImportanceofMonitoring
31
NewhardwareforDEVdatabases(gain>1h)
CarmenBarandela:"DataWarehousesandNoSQL”
TurtleorLeopard?
32CarmenBarandela:"DataWarehousesandNoSQL”
ORACLEMaterialisedViews
• Pre‐aggregatedsummaries• Benefitfromqueryrewrite
33
Source:ORACLE10gDocumenta6on/DataWarehousingGuide
CarmenBarandela:"DataWarehousesandNoSQL”
ORACLEMaterialisedViews
• Pre‐aggregatedsummaries• Benefitfromqueryrewrite
34
Source:ORACLE10gDocumenta6on/DataWarehousingGuide
CarmenBarandela:"DataWarehousesandNoSQL”
Materialised(Summary)Views
• Don’tuseremotetablesifyouneedqueryrewrite
• Creatematerializedviewlogonallsourcetables
35CarmenBarandela:"DataWarehousesandNoSQL”
Materialised(Summary)Views
• Don’tuseremotetablesifyouneedqueryrewrite
• Creatematerializedviewlogonallsourcetables
36CarmenBarandela:"DataWarehousesandNoSQL”
Snapshots
• Usesnapshotstoefficientlyaccessremotetables– Syntax:CREATESNAPSHOT…AS[YourQuery]– Refreshop6ons:• FAST• COMPLETE
• FORCE
37CarmenBarandela:"DataWarehousesandNoSQL”
Snapshots
• Usesnapshotstoefficientlyaccessremotetables– Syntax:CREATESNAPSHOT…AS[YourQuery]– Refreshop6ons:• FAST• COMPLETE
• FORCE
38CarmenBarandela:"DataWarehousesandNoSQL”
PipelinedFunc6ons
• PL/SQLisdatasourceinsteadofatable• MayincreaseperformanceinenvironmentswithheavyPL/SQLuse
39CarmenBarandela:"DataWarehousesandNoSQL”
PipelinedFunc6ons
• PL/SQLisdatasourceinsteadofatable• MayincreaseperformanceinenvironmentswithheavyPL/SQLuse
40
CREATE OR REPLACE TYPE myTableFormat AS OBJECT( col_a NUMBER, col_b DATE, col_c VARCHAR2(25) ) /
CREATE OR REPLACE TYPE myTableType AS TABLE OF myTableFormat
/ CarmenBarandela:"DataWarehousesandNoSQL”
1
PipelinedFunc6ons
41
CREATE OR REPLACE FUNCTION myFunc RETURN myTableType PIPELINED IS
BEGIN
FOR i in 1 .. 5 LOOP PIPE ROW ( myTableFormat( i, SYSDATE+i, 'Row '||i ) ); END LOOP; RETURN;
END;
END;
/
2
CarmenBarandela:"DataWarehousesandNoSQL”
PipelinedFunc6ons
42
SELECT * FROM TABLE( myFunc() );
col_a col_b col_c --------- ---------- ---------- 1 27/10/2010 Row 1 2 28/10/2010 Row 2 3 29/10/2010 Row 3 4 30/10/2010 Row 4 5 31/10/2010 Row 5
3
Useapipelinedfunc.onifyourequireadatasourceotherthanatable!
CarmenBarandela:"DataWarehousesandNoSQL”
DatabaseDesign
• Starschemalike• Highlyde‐normalisedincl.duplica6onofdata• Usesingle‐anributekeyswhereverpossible• Performancemaners!– Becarefulwhenextrac6ngoverdatabaselinks– Certaintablesfromopera6onalsystemsarecopied– Dele6on&recrea6onofindexes– Usepar66ons– Manualcontrolofsta6s6cscollec6on– Op6mizingexecu6onplansvery6me‐consuming
43CarmenBarandela:"DataWarehousesandNoSQL”
Repor6ngApplica6onFramework
• Columnandorderingselec6on• Subreports• Variousoutputformats(e.g.HTML,PDF)
• Charts• Self‐servicerepor6ng• Automatedscheduledreportexecu6on
• Rowandcolumnbasedaccesscontrol
44CarmenBarandela:"DataWarehousesandNoSQL”
DataAccess
45
Name Unit Tel Salary Category
Meyer A 12345 $4,900 3
Schmidt B 23456 $6,400 1
Cook B 34567 $5,700 2
Whichrowsarevisibletome?UnitleaderofBonlyseespersonsfromUnitB.
Whichdata(columns)amIallowedtosee?AsasupervisorImaynotbeen6tledtoseethehealthinsurancecategory.Asafetyormedicalofficermaynotseethesalary,etc.
CarmenBarandela:"DataWarehousesandNoSQL”
UserInterface
46CarmenBarandela:"DataWarehousesandNoSQL”
PixelPerfectForms• UseofApacheFOPlibrary– Examples:
• Employment&traininganesta6ons• Swiss/Frenchcardapplica6onforms
• BusinessObjectsXIEnterprise– Directuse– IndirectuseviaBusinessObjectsJavaSDK– Examples:
• Salaryslips• Cars6ckers• Workorders
47CarmenBarandela:"DataWarehousesandNoSQL”
BusinessObjects
• CommercialtoolfamilyfromSAP• Advantages– Richrepor6ngpossibili6es(interac6veorviaSDK)– AppealingdashboardsusingXcelsius– Onlyafewusersneedtheknowledgetodesignreports
• Drawbacks– Two‐waydatastorage(filesystem&database)– Some6messtabilityproblems– Time‐intensiveadministra6onandmaintenance– Expensive
48CarmenBarandela:"DataWarehousesandNoSQL”
ManagementDashboards
49
DesignedlocallyusingMSOfficeandXcelsius.
DatacomesfromtheMDLdatawarehouse.
PublishedasFlashtotheBOServer.
CarmenBarandela:"DataWarehousesandNoSQL”
Remember:
• Highdatavolumes+analysis=datawarehouse
• OLTPvs.OLAP• Usethefacili6esthetoolprovides– Materializedviews,snapshots,pipelinedfunc6ons
• Keepthingsextensibleandsimple!
• Par66onsareveryhelpful
50CarmenBarandela:"DataWarehousesandNoSQL”
Agenda• DataWarehousesinAdministra6veCompu6ng
• Recap:DataWarehousesTheory
• DataWarehousesandInforma6onSystemsinAIS– Founda6on,HRandFIInforma6onSystems
– ComplexDataExtrac6onProcesses– Pixel‐PerfectRepor6ng– Dashboards
• NoSQL– ...
51CarmenBarandela:"DataWarehousesandNoSQL”
NoSQL
52CarmenBarandela:"DataWarehousesandNoSQL”
NotOnlySQL
WhyNoSQL?(1)Tradi6onalRDBMSsystems:‐Defini6onofarigidschema
‐GuaranteesACIDtransac6ons:
.Atomicity
.Consistency
.Isola6on
.Durability
CarmenBarandela:"DataWarehousesandNoSQL” 53
WhyNoSQL?(1)Tradi6onalRDBMSsystems:‐Defini6onofarigidschema
‐GuaranteesACIDproper6es:
.Atomicity
.Consistency
.Isola6on
.Durability
Scalability
CarmenBarandela:"DataWarehousesandNoSQL” 54
WhyNoSQL?(2)‐ Newwebappsdifferentneeds:.Scalability&elas6city(atlowcost!).Highavailability.Flexibleschemas
.Geographicdistribu6on
‐ Donotnecessarilyneed:.Strongconsistency/integrity.Complexqueries
CarmenBarandela:"DataWarehousesandNoSQL” 55
NoSQLusecases1. Massivedatavolumes.Massivelydistributedarchitecture.Google,Amazon,Yahoo,Facebook...
2. Extremequeryworkload.Impossibletoefficientlydojoinsatthatscalewith
anRDBMS
3. Schemaevolu6on.Schemaflexibilityisnottrivialatlargescale.Schemachangescanbegraduallyintroducedwith
NoSQL.
CarmenBarandela:"DataWarehousesandNoSQL” 56
NoSQLTaxonomy
• Keyvaluestores• Documentdatabases
• Column‐orienteddatabases
• Graphdatabases
CarmenBarandela:"DataWarehousesandNoSQL” 57
NoSQLTaxonomyKeyvaluestores
‐1key(uniqueid)‐>1value(binaryobject,blob)
‐TheDBdoesnotunderstandit:nodatamodel
‐Veryfast
‐Examples:AmazonDynamo,MemcacheDB
CarmenBarandela:"DataWarehousesandNoSQL” 58
NoSQLTaxonomyDocumentdatabases
‐ Key‐valuestore:thevalueisusuallystructured‐ Queryingwithmorethanakeyispossible
‐ Examples:AmazonSimpleDB,CouchDB...
CarmenBarandela:"DataWarehousesandNoSQL” 59
NoSQLTaxonomyColumn‐orienteddatabases
‐Distributed,persistentmul6dimensionalsortedmap.
‐Mapindexedby:rowkey,columnkeyanda6mestamp
‐Examples:GoogleBigTable,Cassandra(Facebook)
CarmenBarandela:"DataWarehousesandNoSQL” 60
Googlepaper:hnp://sta6c.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en//papers/bigtable‐osdi06.pdf
NoSQLTaxonomyGraphdatabases
‐ Basedonthegraphtheory:‐Ver6ces:likeen66es
‐Edges:rela6onshipsbetweentheen66es
‐ Examples:Neo4j,FlockDB(Twiner)
CarmenBarandela:"DataWarehousesandNoSQL” 61
CAPTheorem
CarmenBarandela:"DataWarehousesandNoSQL” 62
Consistency Availability
ParOOonTolerance
CAPTheorem
CarmenBarandela:"DataWarehousesandNoSQL” 63
Consistency Availability
ParOOonTolerance
Only2canbesaOsfiedattheOme
NoConsistency?
CarmenBarandela:"DataWarehousesandNoSQL”
NoConsistency?
CarmenBarandela:"DataWarehousesandNoSQL”
EventualConsistency!!!
EventualConsistency
Thestoragesystemguaranteesthatifnonewupdatesaremadetotheobjecteventuallyallaccesseswillreturnthelastupdatedvalue.
CarmenBarandela:"DataWarehousesandNoSQL” 66
RDBMSvsNOSQL
• StrongConsistency• BigDatasets• Scalingispossible• SQL• Goodavailability• Consolidatedtech.
• EventualConsistency• HugeDatasets• Scalingiseasy• API• Highavailability• S6llimmaturetech.
CarmenBarandela:"DataWarehousesandNoSQL” 67
Спасибо!• Ques6ons?• MoreInfo– [email protected]
CarmenBarandela:"DataWarehousesandNoSQL”