windows scalability: technology terminology, trends jim gray distinguished engineer research...
TRANSCRIPT
Windows Scalability:Windows Scalability:TechnologyTechnologyTerminology,Terminology,Trends Trends
Jim GrayJim GrayDistinguished EngineerDistinguished EngineerResearchResearchMicrosoft CorporationMicrosoft Corporation
OutlineOutline
Progress: an overviewProgress: an overview Scale-Up technology trendsScale-Up technology trends
Cpus, Memory, Disks, NetworkingCpus, Memory, Disks, Networking Scale-Out terminology: Scale-Out terminology:
clones, racks/packs, farms, geoplexclones, racks/packs, farms, geoplex
ProgressProgress Other speakers in this track will tell youOther speakers in this track will tell you
Windows is #1 Windows is #1 How they did it, How they did it,
and how you too can do it.and how you too can do it. Stepping backStepping back
Huge progress in last 5 years.Huge progress in last 5 years. 10x to 100x improvements10x to 100x improvements Now Windows has competitive high-end hardwareNow Windows has competitive high-end hardware
32xSMP, 64bit addressing, 30GBps bus bandwidth, …32xSMP, 64bit addressing, 30GBps bus bandwidth, … Software has evolved Software has evolved (32x smp, 256GB ram, 10 TB DB)(32x smp, 256GB ram, 10 TB DB)
In next 5 years, In next 5 years, expect 10x to 100x improvementsexpect 10x to 100x improvements
AnalyzeAnalyze
MeasureMeasure
ImproveImprove
The Recurring ThemeThe Recurring Theme Windows improved 50% to 500%, Windows improved 50% to 500%,
Q: WHY?Q: WHY? A: Measure, Analyze, ImproveA: Measure, Analyze, Improve
Self TuningSelf Tuning TradeoffsTradeoffs
Buy memory locality & bandwidth with cpu (compress, pack, cluster)Buy memory locality & bandwidth with cpu (compress, pack, cluster) Trade memory for IO (caches)Trade memory for IO (caches)
SpeedupsSpeedups Introduce fast path for common caseIntroduce fast path for common case Repack for smaller I-Cache footprintRepack for smaller I-Cache footprint
ScalabilityScalability remove / improve locksremove / improve locks Cool hotspots cache / disk Cool hotspots cache / disk Examine spins and timeoutsExamine spins and timeouts Affinity/Locality to improve cachingAffinity/Locality to improve caching
Scale UP
Scaleable SystemsScaleable Systems
Scale UP:Scale UP: grow by grow by adding components adding components to a single system.to a single system.
Scale OutScale Out: grow by : grow by adding more systems.adding more systems.
Scale OUT
OutlineOutline
Progress: an overviewProgress: an overview ScaleUp Nodes: technology trendsScaleUp Nodes: technology trends
Cpus, Memory, Disks, NetworkingCpus, Memory, Disks, Networking ScaleOut terminology: ScaleOut terminology:
clones, racks/packs, farms, geoplexclones, racks/packs, farms, geoplex
What’s REALLY New –What’s REALLY New –Windows Scale UpWindows Scale Up 64 bit & TB size main memory64 bit & TB size main memory SMP on chip: everything’s smpSMP on chip: everything’s smp 32… 256 SMP: locality/affinity matters32… 256 SMP: locality/affinity matters TB size disksTB size disks High-speed LANsHigh-speed LANs iSCSI and NAS competitioniSCSI and NAS competition
64 bit – Why bother?64 bit – Why bother? 1966 Moore’s law: 1966 Moore’s law:
4x more RAM every 3 years.4x more RAM every 3 years. 1 bit of addressing every 18 months1 bit of addressing every 18 months
36 years later: 236 years later: 236/3 = 24 more bits 36/3 = 24 more bits Not exactly right, but…Not exactly right, but…
32 bits not enough for servers32 bits not enough for servers32 bits gives no headroom for clients32 bits gives no headroom for clients
So, time is running out ( has run out )So, time is running out ( has run out ) Good news: Good news:
Itanium™ and Hammer™ are maturingItanium™ and Hammer™ are maturingAnd so is the base software And so is the base software (OS, drivers, DB, Web,...)(OS, drivers, DB, Web,...)
Windows & SQL @ 256GB today!Windows & SQL @ 256GB today!
Who needs 64-bit addressing?Who needs 64-bit addressing?You! Need 64-bit addressing! You! Need 64-bit addressing!
640K ought to be enough for anybody.640K ought to be enough for anybody.
Bill Gates, Bill Gates, 1981 1981
But that was 21 years ago But that was 21 years ago == 2 == 221/3 = 14 bits ago.21/3 = 14 bits ago.
20 bits + 14 bits = 34 bits so.. 20 bits + 14 bits = 34 bits so.. 16GB ought to be enough for anybody16GB ought to be enough for anybody Jim Gray, Jim Gray, 20022002
34 bits > 31 bits so…34 bits > 31 bits so…34 bits == 64 bits34 bits == 64 bits
YOU need 64 bit addressing!YOU need 64 bit addressing!
64 bit – why bother?64 bit – why bother? Memory intensive calculations:Memory intensive calculations:
You can trade memory for IO and processingYou can trade memory for IO and processing Example: Data Analysis & Clustering a JHUExample: Data Analysis & Clustering a JHU in memory CPU time is in memory CPU time is
~ ~NlogN , N ~ 100MNlogN , N ~ 100M Disk M chunks Disk M chunks
→→ time time ~ M~ M22
must run many timesmust run many times Now running on Now running on
HP Itanium HP Itanium Windows.Net Server 2003 Windows.Net Server 2003 SQL ServerSQL Server
Graph courtesy of Alex Szalay & Adrian Pope of Johns Hopkins University
Memory in GB
1.0
10.0
100.0
1000.0
10000.0
100000.0
0 10 20 30 40 50 60 70 80 90 100
No of galaxies in Millions
CPU
time
(hrs
)
1
4
32
256
year
decade
week
day
month
Amdahl’s balanced System LawsAmdahl’s balanced System Laws 1 mips needs 4 MB ram and needs 20 IO/s 1 mips needs 4 MB ram and needs 20 IO/s At 1 billion instructions per secondAt 1 billion instructions per second
need 4 GB/cpuneed 4 GB/cpuneed 50 disks/cpu!need 50 disks/cpu!
64 cpus … 3,000 disks64 cpus … 3,000 disks
1 bips1 bipscpucpu4 GB4 GB
RAMRAM 50 disks50 disks10,000 IOps10,000 IOps
7.5 TB7.5 TB
The 5 Minute Rule – Trade The 5 Minute Rule – Trade RAM for Disk ArmsRAM for Disk Arms
If data re-referenced every 5 minutes If data re-referenced every 5 minutes It is cheaper to cache it in ramIt is cheaper to cache it in ram than to get it from disk than to get it from disk
A disk access/second ~ 50$ or A disk access/second ~ 50$ or ~ 50MB for 1 second or ~ 50MB for 1 second or ~ 50KB for 1,000 seconds. ~ 50KB for 1,000 seconds.
Each app has a memory “knee”Each app has a memory “knee” Up to the knee, Up to the knee, more memory helps a lot. more memory helps a lot.
Three TPC Benchmarks: GBs help a LOT!
even if cpu clock is slower
0
25,000
50,000
75,000
100,000
4x1.6Ghz IA32+8GB 4x1.6Ghz IA32+32GB 4x1Ghz Itanium 2 +48GB
Tra
ns
ac
tio
ns
Pe
r S
ec
on
d
64 bit Reduces IO, saves disks64 bit Reduces IO, saves disks Large memory reduces IOLarge memory reduces IO 64-bit simplifies code64-bit simplifies code Processors can be faster (wider word)Processors can be faster (wider word) Ram is cheap (4 GB ~ 1k$ to 20k$)Ram is cheap (4 GB ~ 1k$ to 20k$) Can trade ram for disk IO Can trade ram for disk IO Better response time.Better response time. ExampleExample
tpcC tpcC 4x1Ghz Itanium2 vs 4x1Ghz Itanium2 vs 4x1.6Ghz IA32 4x1.6Ghz IA32 40 extra GB 40 extra GB
→ 60% extra throughput→ 60% extra throughput
4x1.6GhzIA328GB
4x1 GhzIA6448GB
4x1.6GhzIA3232GB
AMD Hammer™ Coming SoonAMD Hammer™ Coming Soon AMD Hammer™ is 64bit capableAMD Hammer™ is 64bit capable 2003: millions of Hammer™ CPUs will ship 2003: millions of Hammer™ CPUs will ship 2004: most AMD CPUs will be 64bit 2004: most AMD CPUs will be 64bit 4GB ram is less than 1,000$ today4GB ram is less than 1,000$ today
less than 500$ in 2004 less than 500$ in 2004 Desktops (Hammer™) Desktops (Hammer™)
and servers (Opteron™). and servers (Opteron™). You do the math,…You do the math,…
Who will demand 64bit capable software?Who will demand 64bit capable software?
A 1TB Main Memory A 1TB Main Memory Amdahl’s law: 1mips/MB , now 1:5Amdahl’s law: 1mips/MB , now 1:5
so ~20 x 10 Ghz cpus need 1TB ramso ~20 x 10 Ghz cpus need 1TB ram 1TB ram 1TB ram ~ 250k$ … 2m$ today~ 250k$ … 2m$ today
~ 25k$ … 200k$ in 5 years~ 25k$ … 200k$ in 5 years 128 million pages128 million pages
Takes a LONG time to fillTakes a LONG time to fill Takes a LONG time to refillTakes a LONG time to refill
Needs new algorithms Needs new algorithms Needs parallel processingNeeds parallel processing Which leads us to… Which leads us to…
The memory hierarchyThe memory hierarchy smp smp numanuma
If cpu is always waiting for memoryIf cpu is always waiting for memoryPredict memory requests and prefetchPredict memory requests and prefetch donedone
If cpu still always waiting for memoryIf cpu still always waiting for memoryMulti-program it (Multi-program it (multiple hardware threads per cpumultiple hardware threads per cpu) ) Hyper Threading: Everything is SMPHyper Threading: Everything is SMP 2 now more later2 now more later Also multiple cpus/chipAlso multiple cpus/chip
If your program is single threadedIf your program is single threaded You waste ½ the cpu and memory bandwidthYou waste ½ the cpu and memory bandwidth Eventually waste 80% Eventually waste 80%
App builders need to plan for threads.App builders need to plan for threads.
Hyper-Threading: SMP on chipHyper-Threading: SMP on chip
The Memory HierarchyThe Memory Hierarchy Locality REALLY mattersLocality REALLY matters CPU 2 G hz, RAM at 5 MhzCPU 2 G hz, RAM at 5 Mhz
RAM is no longer random access.RAM is no longer random access. Organizing the code gives 3x (or more)Organizing the code gives 3x (or more) Organizing the data gives 3x (or more)Organizing the data gives 3x (or more)
LevelLevel latencylatency (clocks)(clocks) size size RegistersRegisters 1 1 1 KB 1 KB L1L1 2 2 32 KB 32 KB L2L2 10 10 256 KB256 KB L3 L3 30 30 4 MB 4 MB Near RAMNear RAM 100100 16 GB 16 GB Far RAMFar RAM 300300 64 GB 64 GB
RAM
Off chip
Icache
Arithmatic Logical Unit
Dcache
L2 cache
The Bus
Remote cache
DiskNetwork
Other Cpus
Other Cpus
Other Cpus
Other Cpus
registers
L1 cache
Remote RAM Remote RAM
Scaleup Systems Scaleup Systems Non-Uniform Memory Architecture (NUMA)Non-Uniform Memory Architecture (NUMA)Coherent but… remote memory is even slowerCoherent but… remote memory is even slower
All cells see a common memory
Slow local main memory Slower remote main memory
Scaleup by adding cellsScaleup by adding cells
Planning for 64 cpu, 1TB ram Planning for 64 cpu, 1TB ram
Interconnect, Interconnect, Service Processor, Service Processor, Partition management Partition management are vendor specificare vendor specific
Several vendors doing thisSeveral vendors doing thisItanium and HammerItanium and HammerSystem interconnect System interconnect
Crossbar/SwitchCrossbar/Switch
Partition Partition managermanager
Config DBConfig DB
CPUCPU CPUCPUCPUCPU CPUCPU
MemMem MemMemMemMem MemMem
I/OI/O ChipsetChipset
CPUCPU CPUCPUCPUCPU CPUCPU
MemMem MemMemMemMem MemMem
I/OI/O ChipsetChipset
CPUCPU CPUCPUCPUCPU CPUCPU
MemMem MemMemMemMem MemMem
I/OI/O ChipsetChipset
CPUCPU CPUCPUCPUCPU CPUCPU
MemMem MemMemMemMem MemMem
I/OI/O ChipsetChipset
Service Service ProcessorProcessor
Service Service ProcessorProcessor
Changed Ratios MatterChanged Ratios Matter
If everything changes by 2x, If everything changes by 2x, Then nothing changes.Then nothing changes.
So, it is the different rates that matter.So, it is the different rates that matter.
Improving FAST Improving FAST
CPU speedCPU speed
Memory & disk sizeMemory & disk size
Network BandwidthNetwork Bandwidth
Slowly changing Slowly changing
Speed of lightSpeed of light
People costsPeople costs
Memory bandwidthMemory bandwidth
WAN pricesWAN prices
What’s REALLY NewWhat’s REALLY New
64 bit & TB size main memory64 bit & TB size main memory SMP on chip: everything’s smpSMP on chip: everything’s smp 32… 256 SMP: locality/affinity matters32… 256 SMP: locality/affinity matters TB size disksTB size disks High-speed LANsHigh-speed LANs iSCSI and NAS competitioniSCSI and NAS competition
We are here
Disks are becoming tapesDisks are becoming tapes Capacity:Capacity:
150 GB now, 150 GB now, 300 GB this year, 300 GB this year, 1 TB by 2007 1 TB by 2007
Bandwidth:Bandwidth: 40 MBps now40 MBps now
150 MBps by 2007150 MBps by 2007 Read time Read time
2 hours sequential, 2 days random now2 hours sequential, 2 days random now4 hours sequential, 12 days random by 20074 hours sequential, 12 days random by 2007
150 IO/s 40 MBps150 IO/s 40 MBps
150 GB150 GB
200 IO/s 150 MBps200 IO/s 150 MBps
1 TB1 TB
Disks are becoming tapesDisks are becoming tapesConsequencesConsequences
Use most disk capacity for archivingUse most disk capacity for archivingCopy on Write (COW) file system Copy on Write (COW) file system in Windows.NET Server 2003in Windows.NET Server 2003
RAID10 saves arms, costs space (OK!).RAID10 saves arms, costs space (OK!). Backup to diskBackup to disk
Pretend it is a 100GB disk + 1 TB diskPretend it is a 100GB disk + 1 TB disk Keep hot 10% of data on fastest part of disk.Keep hot 10% of data on fastest part of disk. Keep cold 90% on colder part of diskKeep cold 90% on colder part of disk
Organize computations to read/write Organize computations to read/write disks sequentially in large blocks.disks sequentially in large blocks.
Networking: Networking: Great hardware & SoftwareGreat hardware & Software
WANs @ 5GBps (1WANs @ 5GBps (1 = 40 Gbps) = 40 Gbps) GbpsEthernet common (~100 MBps)GbpsEthernet common (~100 MBps)
Offload gives ~2 hz/ByteOffload gives ~2 hz/Byte Will improve with RDMA & zero-copy Will improve with RDMA & zero-copy 10 Gbps mainstream by 200410 Gbps mainstream by 2004
Faster I/OFaster I/O 1 GB/s today (measured)1 GB/s today (measured) 10 GB/s under development10 GB/s under development SATA (serial ATA) 150MBps/deviceSATA (serial ATA) 150MBps/device
Wiring is going serial Wiring is going serial and getting FAST!and getting FAST!
Gbps Ethernet and SATA Gbps Ethernet and SATA built into chipsbuilt into chips
Raid Controllers: inexpensive and fast.Raid Controllers: inexpensive and fast. 1U storage bricks @ 2-10 TB1U storage bricks @ 2-10 TB SAN or NAS SAN or NAS
(iSCSI or CIFS/DAFS)(iSCSI or CIFS/DAFS)Enet
100MBps/link
8xSATA
150M
Bps/lin
k
NAS – SAN Horse RaceNAS – SAN Horse Race Storage Hardware Storage Hardware
1k$/TB/y1k$/TB/yStorage Management 10k$...300k$/TB/yStorage Management 10k$...300k$/TB/y
So as with Server ConsolidationSo as with Server ConsolidationStorage Consolidation Storage Consolidation
Two styles: Two styles: NAS NAS (Network Attached Storage)(Network Attached Storage) File File ServerServer
SAN SAN (System Area Network)(System Area Network) Disk Disk ServerServer
Windows supports both models.Windows supports both models. We believe NAS is more manageable. We believe NAS is more manageable. Windows is a great NAS serverWindows is a great NAS server
What’s REALLY New –What’s REALLY New –Windows Scale UpWindows Scale Up 64 bit & TB size main memory64 bit & TB size main memory SMP on chip: everything’s smpSMP on chip: everything’s smp 32… 256 SMP: locality/affinity matters32… 256 SMP: locality/affinity matters TB size disksTB size disks High-speed LANsHigh-speed LANs iSCSI and NAS competitioniSCSI and NAS competition
Take Aways / Call to ActionTake Aways / Call to Action Threads: Plan for SMPs (threads)Threads: Plan for SMPs (threads)
32 cpu and (far) beyond….32 cpu and (far) beyond…. Locality: Use affinity, cache, disk, …Locality: Use affinity, cache, disk, … 64bit: Plan for VERY large memory64bit: Plan for VERY large memory Sequential IO and Disk-as-tapeSequential IO and Disk-as-tape
Plan for huge disks (with spare space)Plan for huge disks (with spare space) Low-overhead networking:Low-overhead networking:
LAN Converging on Ethernet, SATA, …?LAN Converging on Ethernet, SATA, …? Windows.Net Windows.Net Server 2003Server 2003 and successors and successors
will manage petabyte stores.will manage petabyte stores.
OutlineOutline
Progress: an overviewProgress: an overview ScaleUp Nodes: technology trendsScaleUp Nodes: technology trends
Cpus, Memory, Disks, NetworkingCpus, Memory, Disks, Networking ScaleOut terminology: ScaleOut terminology:
clones, racks/packs, farms, geoplexclones, racks/packs, farms, geoplex
Scale UP
Scaleable SystemsScaleable Systems
ScaleUP:ScaleUP: grow by grow by adding components adding components to a single system.to a single system.
ScaleOutScaleOut: grow by : grow by adding more systems.adding more systems.
Scale OUT
ScaleUP ScaleUP andand Scale OUT Scale OUT Everyone does both.Everyone does both. Choice’s Choice’s
Size of a brickSize of a brick Clones or partitionsClones or partitions Size of a packSize of a pack
Who’s software?Who’s software? scaleup and scaleout scaleup and scaleout
both have a both have a largelarge software componentsoftware component
1M$/slice1M$/slice IBM S390?IBM S390? Sun E 10,000?Sun E 10,000?
100 K$/slice100 K$/slice Wintel 8x++ Wintel 8x++
10 K$/slice10 K$/slice Wintel 4x Wintel 4x
1 K$/slice1 K$/slice Wintel 1xWintel 1x
Clones:Clones: Availability+Scalability Availability+Scalability Some applications areSome applications are
Read-mostly Read-mostly Low consistency requirementsLow consistency requirements Modest storage requirement (less than 1TB)Modest storage requirement (less than 1TB)
Examples:Examples: HTML web servers (IP sprayer/sieve + replication)HTML web servers (IP sprayer/sieve + replication) LDAP servers (replication via gossip)LDAP servers (replication via gossip)
Replicate app at all nodes (clones)Replicate app at all nodes (clones) Load BalanceLoad Balance::
Spray& Sieve:Spray& Sieve: requests across nodes. requests across nodes. RouteRoute: requests across nodes.: requests across nodes.
Grow:Grow: adding clones adding clones Fault toleranceFault tolerance: stop sending to that clone.: stop sending to that clone.
Two Clone GeometriesTwo Clone Geometries Shared-Nothing:Shared-Nothing: exactexact replicas replicas Shared-DiskShared-Disk (state stored in server) (state stored in server)
Shared Nothing Clones Shared Disk Clones
If clones have any state: make it disposable. Manage clones by reboot, failing that replace.One person can manage thousands of clones.
Clone RequirementsClone Requirements Automatic replication Automatic replication (if they have any state)(if they have any state)
Applications (and system software)Applications (and system software) Data Data
Automatic request routingAutomatic request routing Spray or sieveSpray or sieve
Management:Management: Who is up?Who is up? Update management & propagationUpdate management & propagation Application monitoring.Application monitoring.
Clones are very easy to manage:Clones are very easy to manage: Rule of thumb: 100’s of clones per admin. Rule of thumb: 100’s of clones per admin.
PartitionsPartitions for Scalability for Scalability Clones are not appropriate for some apps.Clones are not appropriate for some apps.
State-full apps do not replicate wellState-full apps do not replicate well high update rates do not replicate well high update rates do not replicate well
ExamplesExamples EmailEmail DatabasesDatabases Read/write file server…Read/write file server… Cache managersCache managers chat chat
Partition state among serversPartition state among servers Partitioning:Partitioning:
must be transparentmust be transparent to client. to client. split & merge partitions onlinesplit & merge partitions online
Packs Packs for Availabilityfor Availability Each partition may fail Each partition may fail (independent of others)(independent of others)
Partitions migrate to new node via fail-overPartitions migrate to new node via fail-over Fail-over in secondsFail-over in seconds
Pack:Pack: the nodes supporting a partition the nodes supporting a partition VMS Cluster, Tandem, SP2 HACMP,..VMS Cluster, Tandem, SP2 HACMP,.. IBM Sysplex™IBM Sysplex™ WinNT MSCS (wolfpack) WinNT MSCS (wolfpack)
Partitions typically grow in packs.Partitions typically grow in packs. ActiveActive:ActiveActive: all nodes provide service all nodes provide service ActivePassive:ActivePassive: hot standby is idle hot standby is idle
Cluster-In-A-Box now commodity Cluster-In-A-Box now commodity
Partitions and PacksPartitions and Packs
PartitionsScalability
Packed PartitionsScalability + Availability
Parts+Packs RequirementsParts+Packs Requirements Automatic partitioningAutomatic partitioning (in dbms, mail, files,…)(in dbms, mail, files,…)
Location transparentLocation transparent Partition split/merge Partition split/merge Grow without limits (100x10TB)Grow without limits (100x10TB) Application-centric request routingApplication-centric request routing
Simple fail-over modelSimple fail-over model Partition migration is transparentPartition migration is transparent MSCS-like model for servicesMSCS-like model for services
ManagementManagement:: Automatic partition management (split/merge)Automatic partition management (split/merge) Who is up?Who is up? Application monitoring.Application monitoring.
GeoPlex: Farm PairsGeoPlex: Farm Pairs Two farms (or more) Two farms (or more) State State (your mailbox, bank account)(your mailbox, bank account)
stored at both farmsstored at both farms Changes from one Changes from one
sent to othersent to other When one farm failsWhen one farm fails
other provides serviceother provides service MasksMasks
Hardware/Software faultsHardware/Software faults Operations tasksOperations tasks (reorganize, upgrade move)(reorganize, upgrade move) Environmental faultsEnvironmental faults (power fail, earthquake, fire)(power fail, earthquake, fire)
Fail-Over & Load BalancingFail-Over & Load Balancing
Routes request to right farmRoutes request to right farm Farm can be clone or partitionFarm can be clone or partition
At farm, routes request to right At farm, routes request to right serviceservice
At service routes request toAt service routes request to Any cloneAny clone Correct partition.Correct partition.
Routes around failures.Routes around failures.
99 999well-managed nodes
well-managed packs & clones
well-managed GeoPlex
Masks some hardware failures
Masks hardware failures, Operations tasks (e.g. software upgrades)Masks some software failures
Masks site failures (power, network, fire, move,…) Masks some operations failuresA
vaila
bilit
y
ClonedClonedPacked Packed
file file serversservers
Packed Partitions: Database Transparency
Cluster Scale Out ScenariosCluster Scale Out Scenarios
SQL Temp StateWeb File StoreA
ClonedFront Ends(firewall, sprayer,
web server)
SQL Partition 3
The FARM: Clones and Packs of Partitions
Web Clients
Web File StoreBreplication
SQL DatabaseSQL Partition 2 SQL Partition1
Load BalanceLoad Balance
Some Examples:Some Examples: TerraServer:TerraServer:
6 IIS clone front-ends (wlbs)6 IIS clone front-ends (wlbs) 3-partition 4-pack backend: 3 active 1 passive3-partition 4-pack backend: 3 active 1 passive Partition by theme and geography (longitude)Partition by theme and geography (longitude) 1/3 sys admin1/3 sys admin
Hotmail:Hotmail: 1,000 IIS clone HTTP login 1,000 IIS clone HTTP login 3,400 IIS clone HTTP front door3,400 IIS clone HTTP front door + 1,000 clones for ad rotator, in/out bound… + 1,000 clones for ad rotator, in/out bound… 115 partition backend (partition by mailbox)115 partition backend (partition by mailbox) Cisco local director for load balancingCisco local director for load balancing 50 sys admin50 sys admin
Google: Google: (Inktomi is similar but smaller)(Inktomi is similar but smaller) 700 clone spider700 clone spider 300 clone indexer300 clone indexer 5-node geoplex (full replica)5-node geoplex (full replica) 1,000 clones/farm do search1,000 clones/farm do search 100 clones/farm for http100 clones/farm for http 10 sys admin 10 sys admin
SummarySummary Terminology for scaleabilityTerminology for scaleability FarmsFarms of servers: of servers:
ClonesClones: identical: identicalScaleability + availabilityScaleability + availability
PartitionsPartitions: : ScaleabilityScaleability
PacksPacksPartition availability via fail-overPartition availability via fail-over
GeoPlexGeoPlex for disaster tolerance. for disaster tolerance.Architectural Blueprint for Large eSitesArchitectural Blueprint for Large eSites
http://msdn.microsoft.com/library/en-us/dndna/html/dnablueprint.asphttp://msdn.microsoft.com/library/en-us/dndna/html/dnablueprint.aspScalability Terminology:Scalability Terminology: Farms, Clones, Partitions, and Packs: Farms, Clones, Partitions, and Packs: ftp://ftp.research.microsoft.com/pub/tr/tr-99-85.docftp://ftp.research.microsoft.com/pub/tr/tr-99-85.doc
Farm
Clone
SharedNothing
SharedDisk
Partition
Pack
SharedNothing
Active-Active
Active-Passive
GeoPlex
Call to ActionCall to Action Plan for 64 bit addressing everywherePlan for 64 bit addressing everywhere
it is in your future.it is in your future. Use threads Use threads
SMP is in your future SMP is in your future CarefullyCarefully
avoid locks, use locality/affinityavoid locks, use locality/affinity Think of disks as tape:Think of disks as tape:
Sequential vs randomSequential vs randomOnline archiveOnline archive
Windows now has ScaleUp Windows now has ScaleUp andand ScaleOut ScaleOut Think in terms of Geoplexes and FarmsThink in terms of Geoplexes and Farms
© 2002 Microsoft Corporation. All rights reserved.© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.