Download - teradata tools
Teradata
Architecture, Technology, Scalabilty, Performance and Vision for Active Enterprise Data Warehousing
Dr. Barbara SchulmeisterTeradata – a Division of NCR
28. 6. 2005
Agenda
• History• Definitions• Hardware• Architecture• Fault Tolerance and High Availability• Coexistence• Operational System • Tools and Utilities• Data Distribution• SQL Parser• Active Data Warehouse• Scalability
Born to be parallel!
1984 1985 1986 1987 1988 1989 1990 1991 1992 1994...19931979...
Teradata Corp.
Founded
DBCModel 1:
First MPP System!
First 100GB
System!
“Product of the Year”– Forbes
First 500GB
System!
First 700GB
System!
DBC Model 3
“Fastest Growing Small
Company”– INC
Magazine
“Fastest Growing Electronic Company”
– Electronic Business
DBC Model 4
“Leader in Commercial
Parallel Processing”
– Gartner Group
First Terabyte System!
more
3+ TBSystem!
Teradata Timeline Overview
First Beta system shipped Christmas to Wells Fargo Bank
Joint Venture with NCR for next generation systems
Initial public offering on Wall Street
Teradata Timeline (II)
Over 500 Production
Data Warehouses Worldwide!
First Vendorto Publish1TB TPC-D
Benchmark!
Teradata V2 on
WorldMark 4300
DWI VLDB Best Practice Award w/ ATT BMD:
“Data Warehouse and
the Web”
DB Expo RealwareAward w/
Union Pacific: “Data Warehouse
Innovations”
Only Vendorto PublishMulti-user TPC-Ds!
“#1 in MPP”– IDC
Survey in Computer-
world
Teradata Version 2on NCR
3555 SMP
24TBData Warehousein Production!
Demonstrated World’s Largest Data Warehouse
Database at 11TB!
Teradata V2on WorldMark
5100SMP & MPP
100GBTPC-D
Benchmark Leader!
“...only NCR’s Teradata V2 RDBMS hasproven it can
scale…”– Gartner
Group
1996 1997 ......1995
more
Teradata Timeline (III)
Database Programming
and Design Award
IT Award of Excellence
V2R5 Teradata
• TDWI Solution Provider Best Practices in Data Warehousing
• TDWI Leadership in Data Warehousing Award
• DM Review World-Class Solution Award for business Intelligence
• IT Times Award• DM Review 100 Award• DM Review Readership Award• Intelligent Enterprise Real Ware Award
the commitment continues…
64 bit Teradata
Industry leading TPCH at 1TB and 3TB
Largest Data Ware-house system (176 node, 130 TB disk)
Industry leading TPC-D benchmark for all volumesTeradata V2
ported to Microsoft
Windows NTTeradata attains
99.98% availability
1999...1998 2000 2001 2002 2003
V2R6 Teradata
2004 2005
Linux
Alternative Approaches to Enterprise Analytics
Sources
Users
DW
Sources
Users
DW
Marts
Sources
Users
Marts
Sources
Users
Middleware
Data Mart Centric
Virtual,Distributed,Federated
Hub-and-Spoke DataWarehouse
EnterpriseData
Warehouse
• Requires corporate leadership and vision
• Business Enterprise view challenging
• Redundant data costs• High DBA and
operational costs• Data latency
• Only viable for low volume access
• Meta data issues• Network bandwidth and
join complexity issues• Workload typically placed
on workstation
• Business Enterprise view unavailable
• Redundant data costs• High ETL costs• High App costs• High DBA and
operational costs
Cons
• Single Enterprise “Business” View
• Data reusability• Consistency• Low Cost of Ownership
• Allows easier customization of user interfaces & reports
• No need for ETL• No need for separate
platform
• Easy to Build Organizationally
• Limit Scope• Easy to Build
Technically
Pros
Centralized Integrated Data With
Direct Access
Hub-and-Spoke Data Warehouse
Leave Data Where it Lies
Independent Data Marts
A Spectrum of Data Warehouse Architectures
Sources
Users
DW
Sources
Users
DW
Marts
Sources
Users
Marts
Sources
Users
Middleware
Data Mart Centric
Virtual,Distributed,Federated
Hub-and-SpokeData
Warehouse
EnterpriseData
Warehouse
Teradata’s Advocated Data Warehouse
Approach for 20 years, Since 1984!
The goal: Any question, on any data, at any time.
Most time consuming steps:
ll Full scan of big tablesFull scan of big tablesll Complexe joinsComplexe joinsll AggregationAggregationll Sorting Sorting
Frequency of steps OLTP or DSS
OLTP DSS
Diffentiating OLTP - DSS
NCR Server
• Provide customers with growth opportunities and investment protection> Coexistence is enabled
across five generations– NCR 5400E & 5400H
Servers– NCR 4980 & 5380 Servers– NCR 4950 & 5350 Servers– NCR 4900 & 5300 Servers– NCR 485X & 525X Servers
NCR Server Generations
485X&
525X
4900&
5300
4950&
5350
4980&
5380
5400E&
5400H
BYNET V2 / V3
NCR 5400 Server SMP
• 5400E > 1 - 4 nodes> BYNET V2> ESCON & FICON for 3 and
4 node configurations> Field Upgradeable to
5400H1
3
1
3
1
3
1
3
1
3
3rd Node
4th Node
Internal BYNET
switches
3GSM
1st Node1st Node2nd Node2nd Node
Ethernet Switches
Three UPS Modules
Up to 4 nodes within each cabinet
Server Management
NCR 5400 Server MPP
1
3
1
3
1
3
1
3
1
3
1
3
1
3
1
3
1
3
1
3
1
3
Ethernet Switches
BYNET V3 Switches
Five UPS Modules
Up to 10 nodes within each cabinet
Server Management
• Continued rapid adoption of latest Intel® Technology> Dual Intel Pentium Xeon EM 64T 3.6
GHz processors with Hyper-Threading(32-bit and 64-bit capability)
> 800 MHz front side bus• Industry Standard Form Factor
> Up to 10 nodes per cabinet> Integrated BYNET V3 (provides the
capability to physical separate systems between 300-600 meters)
> Integrated Server Management> N+1 UPS> Dual AC
• Multi-Generation Coexistence > Investment protection
Industry CPU Performance per Core
0
500
1000
1500
2000
2500
3000
2004 2005 2006 2007
Year
Itanium 2 1.6 Ghz130nm
Power 4+1.45Ghz130 nm
Ultrasparc 3130 nm1.6Ghz
Power 5~1.9Ghz130 nm
Xeon3.0Ghz 1M130nm
Xeon 3.6 Ghz 90nm
54000Xeon 2M L2 3.6 Ghz 90nm
Xeon2M L2>3.6 Ghz 90nm
Dual Core65 nm
Next Gen Arch. Dual Core65 nm
Multi Core45 nm
Itanium 29M130nm
Power 5+~2.5Ghz90 nm
Montecito90nm
TukwillaCommon Platform65nm
Rock90nm
Power 6~3Ghz65nm
Relative CPU Performance per Core
Symmetric Multi Threading (Hyper Threading)
Dual Core
Xeon
Itanium
Power
Sparc
Multicore, Multithreaded
Relative CPU Performance based on multi-threading and multi-core roadmap capabilities
www.spec.org: benchmarks SPECint2000 and SPECint_rate2000
Gartner Product Ranking 2004 ASEM
FUJITSU HP HP HP IBM NCR SUN
P
rim
e-p
ow
er
HP
9000
Inte
gri
ty
Pro
lian
t
pS
erie
s
Ter
adat
a
Su
nfi
re
PRODUCT 43 45 46 29 45 54 40
The Product category (which was called Technology in previous ASEM updates) focuses on the performance and reliability/availability aspects of each platform. In this category Teradata received a very strong 93.5% of total possible points and leads the IBM pSeries with 74.35% by 44 points or 19%.
Source Gartner 2004 ASEM Report
NCR Enterprise Storage 6842
• NCR Enterprise Storage 6842 Features> Two array modules per cabinet> 56, 73GB, 15K drives
– greater than 8 Terabytes of spinning disk per cabinet
> Dual Quad Fibre Channel Controllers per array for performance and availability
> Typical configuration is 4 NCR 5400 Server nodes per 3 –6842 arrays– 1.2 Terabytes of database space per node (RAID 1)
> Supports RAID 1 and RAID 5> Support for MP-RAS and Microsoft Windows Server 2003
environments
192
MP-RAS and Windows
RAID-1 Only
MPP: supports 2, 3, or 4 nodes per cabinet
73GB – 15K RPM
DMX 2000 M2
RAID -1 OnlyRAID Options
MP-RAS and WindowsOperating Environment
MPP: supports 1 or 2 nodes per cabinetTeradata Use
73GB – 15K RPMDisks
96Maximum Teradata disks
DMX 1000 M2EMC Model
EMC Symmetrix DMX
• Enterprise Fit• Storage Standardization• Extended storage life
through Redeployment
Assumption: Compute and Storage Balance
• A balanced configuration is one where the storage I/O subsystem for each compute node is configured with enough disk spindles, disk controllers, and connectivity so that the disk subsystem can satisfy the CPU demand from that node.
• A supersaturated configuration also can satisfy the CPU demand from that node although the extra I/O may be underutilized.> This is useful for investment protection on certain
upgrade paths.
• All system configurations discussed in this presentation are based on balanced or supersaturated compute nodes.
Node CPU and Storage I/O Balance
Qu
ery
Resp
on
se T
ime
Best Query Response Time
Effective NodeUtilization
# of Disk Drives/Storage Capacity
I/O
Ban
dw
idth
–M
B/
sec
OptimumNode/Storage
Balance and
Response Time
Query R
esponse Time
Band
width
95% Node Utilization
Industry wide, disk drive capacity is increasing at a faster rate than disk drive performance
Common Upgrades Applied
Query Response Time Increases
because you didn’t add more compute power to support
the additional raw data volume.
SYSTEM with CURRENT Nodes
Raw Data Volume
Qu
ery
Resp
on
se T
ime
GROW RAW DATA VOLUME Performance more than adequate: Add more data to all nodes
LINEAR GROWTHMaintain Query Performance with more nodes
Typical System Expansion
SYSTEM with Current Nodes
SYSTEM with More or Faster Nodes
Query Response Time Remains Constant
because you add proportionally more raw data volume as
compute power.
Scale out with Teradata by adding compute nodes,
interconnect, storage arrays, and disks.
aka “horizontal scalability”
Raw Data Volume
Qu
ery
Resp
on
se T
ime
Common Upgrades Applied
Query Response Time Decreases
because you didn’t add more raw data volume to
offset the increase in compute power.
“Scale vertically” with Teradata by increasing
compute power.
SYSTEM with Current Nodes
SYSTEM with More or Faster Nodes
Raw Data Volume
Qu
ery
Resp
on
se T
ime
GROW QUERY PERFORMANCE Raw Data Volume adequate: Upgrade to faster CPUs
SYSTEM with Current Nodes
SYSTEM with More or Faster Nodes
Scale to Target query performance and data volume by increasing compute power
and adding storage.
Adjust Query Performance and Data Volume to match service level agreement
Raw Data Volume
Qu
ery
Resp
on
se T
ime
Combo: Upgrade Nodes and Increase Storage Per Node
Improve query performance and adjust data volume by
reducing storage per node and adding more nodes.
SYSTEM with Current Nodes
SYSTEM with More Nodes
Improve Query Performance and Adjust Data Volume to match service level agreement
Raw Data Volume
Qu
ery
Resp
on
se T
ime
Scaling by Reconfiguration and Expansion
CPU(s)
CPU(s)
CPU(s)
CPU(s)
CPU(s)
CPU(s)
CPU(s)
CPU(s)
Cache
Cache
Cache
Cache
Cache
Cache
Cache
Cache
Memory
Memory
Memory
Memory
Memory
Memory
Memory
Memory
Disk Storage
Disk Storage
Disk Storage
Disk Storage
Disk Storage
Disk Storage
Disk Storage
Disk Storage
BY
NE
T F
ab
rics
Teradata Shared-Nothing MPP• Designed for Slope of 1 Linear Scaling • Optimized for very high data rates to/from disk• Excellent performance and efficiency for data
warehousing
Architecture Determines Scalability
Interconnect used only for database messages, no I/O or memory traffic
All memory accesses are local
CPU uses independent direct I/O path to Disk
Teradata MPP Architecture
• Nodes> Incrementally scalable to
1024 nodes> Windows or Unix
• Storage> Independent I/O> Scales per node
• BYNET Interconnect> Fully scalable bandwidth
• Connectivity> Fully scalable> Channel – ESCON/FICON> LAN, WAN
• Server Management> One console to view
the entire system
SMP Node1 SMP Node2 SMP Node3 SMP Node4
Server Management
CPU1 CPU2 CPU1 CPU2 CPU1 CPU2 CPU1 CPU2
Dual BYNET Interconnects
Memory Memory Memory Memory
Teradata Node SW Architecture (SMP) 4-Node MPP Clique
Perfectly Tuned Nodes Working in Parallel for Scalability and Availability
Node Software ArchitectureD
isk A
rray
VP
RO
CS
BYNET
VPROCsAMP & PE
VPROCsAMP & PE
VPROCsAMP & PE
VPROCsAMP & PE
CommunicationInterfaces
Access Module
ProcessorVPROCS
Parsing EngineVirtual
Processors(VPROCS)
UNIX, Windows 2000
Parallel Database Extensions (PDE)
PE2
PE1 AMP1 AMP5
AMP2 AMP6
AMP3 AMP7
AMP4 AMP8
Channel Gateway
LAN Gateway
PEs recieve the queries and figure out the query planAMPs interact with the disk arrays and process the data
Parser
PE VProc
Optimizer
DispatcherSessionControl
Relational Database Management
File System / Data Management
VAMP
Specifically designed for data warehousing workloads
The Scalable BYNET Interconnect
Multiple SimultaneousPoint-to-Point Messaging
BroadcastMessaging
Node
Node
Node
Node
Node
Node
Node
Node
Node
The Teradata Optimizer
chooses between Point-to-Pointand Broadcast
Messaging to select the most effective communication.
• Bandwidth scales linearly to 1,024 nodes• Redundant, fault tolerant network• Guaranteed message delivery
Node
Traditional Configuration
X
Built-In Integrated Fail Over
• Teradata provides built-in node failover.> Cost effective> Easy to deploy
• Work migrates to the remaining nodes in the cliques. • System performance degradation up to 33%.
Node Node Node Node Node Node Node Node
Disk Array Disk Array Disk Array Disk Array Disk Array Disk Array
86% SystemPerformance
Continuity
Large Cliques
• Double the number of nodes in a clique up to 8.• Work distributed across a greater number of nodes.• Minimize system performance impacts – may not be
noticeable to end-users.
XFibre Channel
Switches
Node Node Node Hot Standby
Disk Array Disk Array Disk Array
100% System Performance
Continuity
Hot Standby Nodes
• Work re-directed to a Hot Standby Node.• No system performance impacts.• Teradata restart can be postponed to a maintenance
window.
X
Node Node Node Node Node NodeNode Hot Standby
Disk Array Disk Array Disk Array Disk Array Disk Array Disk Array
100% System Performance
Continuity
Large Clique + Hot Standby Node
• Same performance benefits of Hot Standby node.• Reduced costs for larger system implementations.
Fibre Channel Switches
X
BYNET
Server Nodes
DiskArray Subsystem
Clique
High Availability
Case Hardware Teradata
Power FailureUPS (redundant),
Dual ACNode Failure VPROC Migration (VAMP, PE)Bynet failure Redundant BYNET
Disc failureRAID-1/-5/-S in Disc
SubsystemMore than one Disc Failure Fallback-Option
Clique Failure Fallback-Option
Coexistence Considerations
• VAMPs manage the same amount of data• Coexistence enables the faster nodes to be realized by
running more VAMPs per node
1x 1.5x 2.0xPerformance Factor
Generation xVAMPS
AMP AMP AMPAMP
AMP AMP AMPAMP
Generation xVAMPS
AMP AMP AMPAMP
AMP AMP AMPAMP
AMP AMP
AMP AMP
Generation xVAMPS
AMP AMP AMPAMP
AMP AMP AMPAMP
AMP AMP AMPAMP
AMP AMP AMPAMP
SMP Node1 SMP Node2 SMP Node3 SMP Node4
AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP
System Expansion with Teradata Coexistence
• The utilization of multiple generations of hardware within a single Teradata MPP system
SMP Node1 SMP Node2 SMP Node3 SMP Node4
Server Management
AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP AMP
Dual BYNET Interconnects
5380 - 9 AMPs per node 5400 - 12 AMPs per node
11/2000Original Footprint
8 Node 5250
Customer Example (72 Nodes, 4 Generations)
Generation “A”
1/2001 Expansion 1
12 Node 5250
11/2001 Expansion 28 Node 5255
6/2002 Expansion 312 Node 5300
Generation “B”
6/2003 Expansion 416 Node 5350
Generation “C”
6/2004 Expansion 5
16 Node 5380
Generation “D”
2005Future 5400Expansion
Generation “E”
Database and Operating System
• Database> Teradata V2R6> Support one Release Back V2R5.1.X (Current Exception in Place V2R5.0.3)
• Unix> MP-RAS 3.03 required for Teradata Database on 5400> MP-RAS 3.02 still supported on previous server generations
• Microsoft Windows> Microsoft Windows Server 2003 recommended for new and expanding 5400 motions> Microsoft Windows 2000 supported in 2Q 2005
485x/525x 4900/5300 4950/5350 4980/5380 5400
V2R5.0.3, V2R5.1.X and V2R6 on MP-RAS 3.02
V2R6 on WS03 (2Q 2005)
V2R5.0.3, V2R5.1.X, V2R6 on MP-RAS 3.03
V2R5.0.3, V2R5.1.X, V2R6 on W2K (2Q 2005)
• Analytic extensions such as Extended Windows Functions & Multiple Aggregate Distincts
• Random Stratified Sampling
• Join Elimination• Extended Transitive
Closure• Partial Group By • Early Group By • Derived Table Rewrite• Very Large SQL
Tactical & Event-DrivenDecision Making• Partial Covering Join Index• Global Index• Sparse Index• Join Index Extensions• ODS Workload Optimization• Stored Procedures Enhancements
Single Version ofthe Truth
• Cylinder Read • Partitioned Tables (PPI)• Value List Compression• 2000 Columns, 64 Columns per Index • Identity Column
•Index Wizard•Statistics Wizard & Collect Statistics Improvements
•Query Log•Extreme Workload Management & Administration
•Roles and Profiles•SQL Assistant/Web Edition
•Availability•Performance Dashboard & Reporting
Trusted,Integrated Environment
Strategic Decision Making• Extended Grouping• Inner Join Optimization• Eliminate Unnecessary
Outer Joins• Hash Joins• UDFs for Complex Analysis
and Unstructured Data
V2R5.0 Features
• Security enhancements (Encryption)• DBQL enhancements • Database Object Level Use Count• ROLES enhancements• Priority Scheduler enhancements• TDQM enhancements• No Auto Restart After Disk-Array Power
Failure• Cancel Rollback• Incompatible Package Warning• Disk I/O Integrity Check
• Enhancements to Triggers• Extra FK-PK Joins in Join Index• UDFs for XML Processing etc.
• Enhancement to Identity Column• UTF16 Support• PPI Dynamic Partition Elimination• Large Objects (LOBs)
+ V2R5.1 Features
• Continuous Update Performance & Manageability
• Faster Join Index Update• Join Update Performance• Bulk Update Performance• Teradata Warehouse Builder Full Functionality
& Platform Support
• Continuous Update Performance & Manageability
• Faster Join Index Update• Join Update Performance• Bulk Update Performance• Teradata Warehouse Builder Full Functionality
& Platform Support
Data FreshnessData Freshness
•UDFs for Data Transformation and Scoring
V2R6.0 Feature List
• Remove 1MB limit on plan cache size• Increase response buffer to 1MB• Table header expansion• Improve Random AMP Sampling• Top N Row Operation• Recursive Queries
Tactical & Event-Driven DecisionMaking• Improve Primary Index Operations\• Improved IN-list processing• External Stored Procedures• Trigger calling a Stored Procedure• Stored Procedure Internals Enhancements• Queue tables
Single View of Your Business
• Stored Procedure LOB support• External Table Function• Partition level BAR• Eliminate indexed row IDs (PPI)• PPI Join performance improvement• DBS Information consolidation
Trusted,Integrated Environment
Strategic Decision Making
• Teradata Dynamic Workload Management
• Extensible User Authentication• Directory Integration• Global deadlock logging• Faster Rollbacks
Data FreshnessData Freshness
• Replication Services • Array support • Priority Scheduler enhancements• Reduce restart time
Teradata Utility PakTeradata AdministratorTeradata SQL AssistantTeradata SQL Assistant/Web Edition BTEQODBCJDBCCLIOLE DB Provider
Teradata Tools & Utilities (1)
Teradata Database
Any Query,Any Time
.com
Database ManagementTeradata ManagerTeradata Dynamic Query Manager Teradata System Emulation ToolTeradata Visual ExplainTeradata Index WizardTeradata Statistics Wizard
MetadataTeradata Meta Data Services
Load/UnloadTeradata Warehouse Builder FastLoad, MultiLoad & FastExportTeradata TPumpAccess Modules
Mainframe ConnectivityMainframe Channel ConnectTS/API, CICS, HUTCNS & IMS/DC
Teradata Warehouse
ParallelIn
ParallelOut
Technical Differentiator: Database Utilities
Teradata Utilities Are Robust and Mature
• Teradata utilities are fully parallel.• Teradata utilities have checkpoint restart capability.• Data loads directly from the source into the database.
> No manual data partitioning.> No file splitting.> No intermediary file transfers.> No separate data conversion step.
Teradata Tools & Utilities (2)
• Teradata Data Profiler• Teradata CRM• Teradata Warehouse Miner• Teradata Demand Chain Management• Teradata Supply Chain Management• .....
(LDM = Logical Data Model)
• Financial Solution LDM• Retail LDM• Communication LDM• Insurance/Healthcare LDM• Manufacturing LDM• Government LDM• Media and Entertainment LDM• Travel/Transportation LDM
Two Basic Software Architecture ModelsTask Centric and Data Centric
Uniform and shared accessto all platform resources(disk, etc) is REQUIRED
Exclusive access to a subset of resources
Data
Data Data
Data
Parallel Unit Parallel Unit
ParallelOptimizerTask
Request
DATA DATA
Task
Request
SharedMemory
Request Request
Data Centric Software: Teradata Virtual AMP
• Each virtual AMP has rows from every table• Each virtual AMP works independently on its rows• Goal: Database rows are equally distributed across
multiple tables
16151413121110987654321
282726252423222120191817
Table A Table B Table C
2521171395
26221814106
27231915117
28242016128
1 2 3 4
P
DM
P
DM
P
DM
P
DM
AMP 1 AMP 2 AMP 3 AMP 4
Tables AMP - Balanced collection of three abstracted platform resources
P
DMMemory Disk
Processor
Data distribution by Primary Index
Node1 Node2 Node3 Node4 Node1 Node2 Node3 Node4 Node1 Node2 Node3 Node4 Node1 Node2 Node3 Node4
Dual BYNET Interconnects
Row Hash (32 bits)
Primary Index value for a row
Hashing Algorithm
Hash Map
Current configuration PrimaryCurrent configuration FallbackReconfiguration PrimaryReconfiguration Fallback
Destination Selection Word(DSW) – first 16 bits
Teradata Hashing
7225
32 bit Row Hash Remaining16 bitsDSW #
Hashing Algorithm
0000 0000 0001 1010 1100 0111 0101 1011
AMP 1 AMP 2 AMP x
7225 2 4/13 O
AMP 3
V V V V0 0 1 A
(Hexadecimal)
000 01 02 03 04 05 06 07 08 01 02 03 04 05 001 02 03 04 05 06 07 08 01 02 03 04 05 001 02 03 04 05 06 07 08 01 02 03 04 05 001 02 03 04 05 06 07 08 01 02 03 04 05 001 02 03 04 05 06 07 08 01 02 03 04 05 001 02 03 04 05 06 07 08 01 02 03 04 05 0
HASH MAP
0 1 2 3 4 5 6 7 8 9 A B C D E F
001002003004005
Bucket Number
SELECT * FROM ORDERWHERE order_number = 7225;
Table ORDERO r d e r
N u m b e rC u s to m e r
N u m b e rO r d e rD a te
O r d e rS t a t u s
P K
U P I
7 3 2 57 3 2 47 4 1 57 1 0 37 2 2 57 3 8 47 4 0 27 1 8 87 2 0 2
231121312
4 / 1 34 / 1 34 / 1 34 / 1 04 / 1 54 / 1 24 / 1 64 / 1 34 / 0 9
OOCOCCCCC
Primary Index Choice Criteria
ACCESSMaximize one_AMP operations: choose the column most frequently
used for access
DISTRIBUTIONOptimize parallel processing: choose a column that provides good
distribution
VolatilityReduce maintenance resource overhead (I/O): choose a column
with stable data values
Data distribution by Primary Index - 2
48 bit table ID 32 bit row hash value Index value
Node1 Node2 Node3 Node4 Node1 Node2 Node3 Node4 Node1 Node2 Node3 Node4 Node1 Node2 Node3 Node4
Dual BYNET Interconnects
SQL requestParser algorithm
Logical block identifier
SQL Parser Overview
Request Parcel
AMP Steps
Statistics
Data Parcel
Generator
GNCApply
Resolver
Optimizer
Syntaxer
Security
Cached?
DDDBase, AccRightsTVM, TVFieldsIndexes
Costs
Serial stepsParallel stepsIndividual and common steps (MSR)Additional:Triggers, check constraints, references, foreign keys, join indexescollected statistics or dynamic sampling
Statistics Summary
Collect statistics on• all non-unique indexes• UPI of any table with less than x rows per AMP
(dependent on available number of AMPs)• All indexes of a join index• any non-indexed column used for join constraints• indexes of global temporary tables
Collected statistics are not automatically updated by the system
Refresh statistics when 5-10% of the table rows have changed
Database Workload Continuum
EnterpriseData
Warehouse
Strategic Decision RepositoriesTransactional Repositories
OLTPi OLTPnOLTP1 • • •ODS1
Tactical Decision Repositories
• • •ODS2
• • •
Workload Continuum
Transactional (OLTP)
• User Profiles• Customers• Clerks
• Services:• Transactions• Bookkeeping
• Access Profile:• Frequent updates• Occasional lookup
• Data:• Current “state” data• Limited history• Narrow Scope
Strategic (EDW)
• User Profiles• Back Office Services• Management• Trading Partners
• Services:• Strategic decisions• Analytics (e.g. scoring)
• Access Profile:• Bulk Inserts – Some Updates• Frequent complex analytics
• Data Model:• Periodic “state” data• Deep history• Enterprise integrated view
Tactical (ODS)
• User Profiles• Front Line Services• Customers - Indirectly
• Services:• Lookups• Tactical decisions• Analytics (e.g. scoring)
• Access Profile:• Continuous updates• Frequent lookups
• Data Model:• Current “state” data• Recent history• Integrated business areas
Database Workload Continuum
ActiveData
Warehouse
Tactical and Strategic Decision RepositoriesTransactional Repositories
OLTPi OLTPnOLTP1
Workload Continuum
Transactional (OLTP)
• User Profiles• Customers• Clerks
• Services:• Transactions• Bookkeeping
• Access Profile:• Frequent updates• Occasional lookup
• Data:• Current “state” data• Limited history• Narrow Scope
Strategic (EDW)
• User Profiles• Back Office Services• Management• Trading Partners
• Services:• Strategic decisions• Analytics (e.g. scoring)
• Access Profile:• Bulk Inserts – Some Updates• Frequent complex analytics
• Data Model:• Periodic “state” data• Deep history• Enterprise integrated view
Tactical (ODS)
• User Profiles• Front Line Services• Customers - Indirectly
• Services:• Lookups• Tactical decisions• Analytics (e.g. scoring)
• Access Profile:• Continuous updates• Frequent lookups
• Data Model:• Current “state” data• Recent history• Integrated business areas
Data Sophistication
Continuous update and time-sensitive queries
become important
OPERATIONALIZINGWHAT IS happening?
Event-based triggering takes hold
ACTIVATINGMAKE it happen!
Primarily batch andsome ad hoc reports
Increase in ad hoc analysis
ANALYZINGWHY
did it happen?
REPORTINGWHAT
happened?Analytical modeling
grows
PREDICTINGWHAT WILL
happen?
Batch
Ad Hoc
Analytics
Continuous Update/Short Queries
Event-Based Triggering
Data Warehouse Needs Will Evolve
• Query complexity grows• Workload mixture grows• Data volume grows• Schema complexity grows• Depth of history grows• Number of users grows • Expectations grow
Single View of the Business – Better, Faster Decisions – Drive Business Growth
Wo
rklo
ad
Co
mp
lexit
y
SimultaneousWorkloads:
Strategic, tactical, loading
Increasing depth and breadth of
users and queries
Increasing depth and breadth of data
Data Sophistication
Continuous update and time-sensitive queries
become important
OPERATIONALIZINGWHAT IS happening?
Event-based triggering takes hold
ACTIVATINGMAKE it happen!
Primarily batch andsome ad hoc reports
Increase in ad hoc analysis
ANALYZINGWHY
did it happen?
REPORTINGWHAT
happened?Analytical modeling
grows
PREDICTINGWHAT WILL
happen?
Batch
Ad Hoc
Analytics
Continuous Update/Short Queries
Event-Based Triggering
Data Warehouse Needs Will Evolve
• Query complexity grows• Workload mixture grows• Data volume grows• Schema complexity grows• Depth of history grows• Number of users grows • Expectations grow
Single View of the Business – Better, Faster Decisions – Drive Business Growth
Wo
rklo
ad
Co
mp
lexit
y
Measure
Chasm from static to dynamic decision-
making
Optimize
Execute
Automate
Understand
Database Requirement:Data Warehouse
Foundation must handle multi-dimensional growth!
Data Sophistication
Continuous update and time-sensitive queries
become important
OPERATIONALIZINGWHAT IS happening?
Event-based triggering takes hold
ACTIVATINGMAKE it happen!
Primarily batch andsome ad hoc reports
Increase in ad hoc analysis
ANALYZINGWHY
did it happen?
REPORTINGWHAT
happened?Analytical modeling
grows
PREDICTINGWHAT WILL
happen?
Batch
Ad Hoc
Analytics
Continuous Update/Short Queries
Event-Based Triggering
Data Warehouse Needs Will Evolve
• Query complexity grows• Workload mixture grows• Data volume grows• Schema complexity grows• Depth of history grows• Number of users grows • Expectations grow
Single View of the Business – Better, Faster Decisions – Drive Business Growth
Wo
rklo
ad
Co
mp
lexit
y
The Multi-Temperature Warehouse
• Customers desire deep historical data in the warehouse.> The access frequency or average temperature of data
varies.– HOT, WARM, COOL, dormant
> Seamless management required.
• Teradata systems can address this need through a combination of technologies, such as:> Partitioned primary index (PPI).> Multi-value compression.> Priority scheduler.
Priority Scheduler
Three Tiers of Workload Management
ADW
Database Query Log
Teradata DynamicQuery Manager
Pre-Execution
Post-Execution
•Control what and how much is allowed to begin execution
•Manage the level of resources allocated to different prioritiesof executing work
(prioritized queues)
•Analyse query performance and behavior after completion
Teradata Dynamic Query Manager
Indexes
• PI (UPI and NUPI)• SI (USI and NUSI)• Join Index
single table indexmulti table indexaggregated indexsparse index (where clause used)partial covering global
• Materialized Views (join index)
An Integrated, Centralized Data Warehouse Solution Database Must Scale in Every Dimension
Data Volume(Raw, User Data)
SchemaSophistication
QueryFreedom
QueryComplexity
DataFreshness
Query Data Volume
QueryConcurrency
MixedWorkload
An Integrated, Centralized Data Warehouse Solution Database Must Scale in Every Dimension
Data Volume(Raw, User Data)
SchemaSophistication
QueryFreedom
QueryComplexity
DataFreshness
Query Data Volume
QueryConcurrency
MixedWorkload
TheTeradata
Difference
Data Volume(Raw, User Data)
Customers Need to Evaluate “Real Life”
Workloads
SchemaSophistication
QueryFreedom
QueryComplexity
DataFreshness
Query Data Volume
The Teradata Difference“Multi-dimensional Scalability”
QueryConcurrency
MixedWorkload
Good Example?TPC-H
Benchmark
Companies generating >80% of the industry revenue utilize Teradata Data Warehousing
Teradata Experiencein the Communications Industry
Some of Teradata’s Retail Customers Worldwide
65% of Top Global Retailers
60% of Top Most Admired
Global Companies
80% of Top Global Telco Firms
70% of Top Global Airlines
50% of the Top Transportation
Logistic Firms
Teradata is Well-Positioned in the Top Global 3000 Industries
• Leading industries> Banking > Government> Insurance & Healthcare> Manufacturing > Retail> Telecommunications> Transportation Logistics> Travel
• World class customer list> More than 750 customers> Over 1200 installations
• Global presence > Over 100 countries
FORTUNE Global Rankings, April 2005
Industry Leaders Use Teradata
• Leading industries> Banking > Government> Insurance & Healthcare> Manufacturing > Retail> Telecommunications> Transportation Logistics> Travel
• World class customer list> More than 750 customers> Over 1200 installations
• Global presence > Over 100 countries
Teradata Global 400 Customers
32% ofFinancial Services
Industry
54% of Retailers
50% ofTransportation
Industry
50% of Telco Industry
19% of Manufacturers
Data Volume(Raw, User Data)
SchemaSophistication
QueryFreedom
QueryComplexity
DataFreshness
Query Data Volume
QueryConcurrency
MixedWorkload
TheTeradata
Difference
www.teradata.com