scalability of the microsoft cluster service...windows nt clusters development goals • extend...
TRANSCRIPT
![Page 1: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/1.jpg)
Scalability of theMicrosoft Cluster Service
Werner Vogels,Dan Dumitriu, Ashutosh Agrawal,
Teck Chia, Katherine Guo
Reliable Distributed Systems Group
Dept. of Computer ScienceCornell University
![Page 2: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/2.jpg)
Agenda• Research Goals
• Intro into MS Cluster Service
• Practical Scalability
• Evaluation of MSCS components
• Conclusions
• What’s Cookin’?
![Page 3: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/3.jpg)
Disclaimer©
• The tests have taken MSCS far beyond thegoals set in its design.
• Any limitations are due to to pushing thetechnology to extremes, and are not presentin the commercial systems.
![Page 4: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/4.jpg)
Agenda• Research Goals
• Intro into MS Cluster Service
• Practical Scalability
• Evaluation of MSCS components
• Conclusions
• What’s Cookin’?
![Page 5: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/5.jpg)
Research Goals
General: Reliable Distributed Systems
Specific Cluster Research:– Efficient Distributed Management
– Low Overhead Scalability
– Cluster Collections
– Cluster Aware Programming Tools (Quintet)
![Page 6: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/6.jpg)
Research into Scalable Clusters
• Today’s practice– Parallel Computing on 512++ nodes
– High-Availability up to 16 nodes
• Distribution and Fault Management are veryscale sensitive.– Failure Management
– Node Membership
– Cluster-Wide Consistency
![Page 7: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/7.jpg)
0
4
8
12
16
20
SMP Processors
ClusterNodes
24
28
Clustersof
SMP Systems
For example16 Nodes of16 Proc SMPSystems =256 CPUs
256 way
64 way
The Reality of Scalable Clusters
![Page 8: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/8.jpg)
Microsoft.com
![Page 9: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/9.jpg)
Mandatory Reading
“In Search of Clustersthe ongoing battle in lowly parallel computing”
Gregory Pfistersecond editionPrentice Hall
![Page 10: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/10.jpg)
Agenda• Research Goals
• Intro into MS Cluster Service
• Practical Scalability
• Evaluation of MSCS components
• Conclusions
• What’s Cookin’?
![Page 11: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/11.jpg)
Windows NT ClustersWhat is clustering to Microsoft?
• Group of independent systems that appear as asingle system
• Managed as a single system
• Common namespace
• Services are “cluster-wide”
• Ability to tolerate component failures
• Components can be added transparently to users
• Existing client connectivity is not effected byclustered applications
![Page 12: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/12.jpg)
Windows NT ClustersDevelopment goals
• Extend Windows NT to seamlessly include cluster
features
• Ship high-availability features for Windows NT first
– Support key applications without modification
– Failover support for base Windows NT hardware,
services, and applications
– Available API for ISV products
• Develop scalability product later
![Page 13: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/13.jpg)
MSCS Features
• Shared nothing– Simplified hardware configuration
• Remoteable tools
• Windows NT manageability enhancements– Never take a “cluster” down: rolling upgrade
• Microsoft® BackOffice™ product support
• 3rd Party Support: SAP, Oracle
![Page 14: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/14.jpg)
Non-Features Of MSCS
• Not lock-step/fault-tolerant
• Not able to “move” running applications– “MSCS” restarts applications that are failed over to other
cluster members
• Not able to recover shared state between client andserver (i.e., file position)– All client/server transactions should
be atomic
– Standard client/server developmentrules still apply
![Page 15: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/15.jpg)
MSCS Cluster
Client PCs
Server A Server B
Disk cabinet A
Disk cabinet B
Heartbeat
Cluster management
![Page 16: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/16.jpg)
![Page 17: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/17.jpg)
Agenda
• Research Goals
• Intro into MS Cluster Service
• Practical Scalability
• Evaluation of MSCS components
• Conclusions
• What’s Cookin’?
![Page 18: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/18.jpg)
Scaling Distributed Systems 101
• Reduce algorithmic dependency on the numberof nodes.
• Traditional Solutions:– Reduce Synchronous Behavior
– Reduce System Complexity
• Radical Solutions:– Epidemic (gossip, probabilistic) techniques
![Page 19: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/19.jpg)
Scaling MSCS?
• Why do we care? (Tools, Tools, Tools)
• Do the Distributed Algorithms scale?
• Are there bottlenecks in the implementation?
• Is it a good basis for Cluster Aware Support?
![Page 20: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/20.jpg)
Agenda
• Research Goals
• Intro into MS Cluster Service
• Practical Scalability
• Evaluation of MSCS components
• Conclusions
• What’s Cookin’?
![Page 21: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/21.jpg)
Cornell Test Cluster
• 32 node MSCS Cluster
• Modified MSCS code
• 300 MHz PII - 200 P6 (128 Mb memory)
• 100 Mbit/sec Switched Ethernet
• Test environment– Unloaded systems
– Loaded system with IO intensive Apps
![Page 22: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/22.jpg)
Cluster.Exe Cluster API DLL
Cluster API stub
Cluster administrator
DatabaseManager Membership
Manager
GlobalUpdate
Manager
FailoverManager
EventProcessor
NodeManager
ResourceManager
Physicalresource DLL
Logicalresource DLL
Applicationresource DLL
ResourceAPI
Reliable ClusterTransport + Heartbeat
Applicationresource DLL
Resourcemonitors
ObjectManager
MSCLUS.DLL
LogManager
CheckpointManager
Cluster API DLL
Cluster API DLL
Network
MSCS 1.X Architecture
Res COM Res API
![Page 23: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/23.jpg)
Components under Investigation
• Failure Detection
• Node Membership
– Join operation
– Reconfiguration after failure
• Consistent Distributed State Management
![Page 24: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/24.jpg)
Failure Detection
• Heartbeat broadcast
– over all interfaces
– period 1.2 second
• Interface suspicion after 3 misses
• Node Suspicion after 6 misses (7.2 seconds)
![Page 25: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/25.jpg)
Membership Join
• 6 phase operation– discovery
– lock
– enable network
– petition
– database sync
– unlock
![Page 26: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/26.jpg)
Membership Regroup
• 5 Phase fullydistributed– Activate
– Closing
– Pruning
– Cleanup phase one
– Cleanup phase two
![Page 27: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/27.jpg)
Global Update I
• Atomic / Total Order– Organize nodes in a
ring
– Acquire lock
– Transmit to each nodein order
– Release lock
• Handles a number offailure scenarios
![Page 28: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/28.jpg)
Global Update
• Developed for sparseupdates of OSstructures
• Implemented in MSCSusing repeated RPC
• Collapses under load
![Page 29: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/29.jpg)
Agenda
• Research Goals
• Intro into MS Cluster Service
• Practical Scalability
• Evaluation of MSCS components
• Conclusions
• What’s Cookin’?
![Page 30: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/30.jpg)
Conclusions• Can the current Algorithms scale?
– FD & Regroup: Yes
– GUP: 10-16 nodes
• Are there bottlenecks in the implementation?– FD & Regroup: Repeated p2p in
– Join & GUP: RPC Trains
• Is it a good basis for cluster aware support– NO
![Page 31: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/31.jpg)
Agenda
• Research Goals
• Intro into MS Cluster Service
• Practical Scalability
• Evaluation of MSCS components
• Conclusions
• What’s Cookin’?
![Page 32: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/32.jpg)
Rat Pack Clusters
![Page 33: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/33.jpg)
A Quick Glance in the Kitchen
• Tested on 200++ nodes
• Mixed Nuts: NT & Unix
• Provides Cluster Events
• Epidemic FD & Membership
• Probabilistic CommunicationTools
• Sub-Clusters for LimitedScalability operations
Rat Pack Clusters
![Page 34: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/34.jpg)
Be Courageous, Do A Demo
![Page 35: Scalability of the Microsoft Cluster Service...Windows NT Clusters Development goals • Extend Windows NT to seamlessly include cluster features • Ship high-availability features](https://reader034.vdocuments.us/reader034/viewer/2022042317/5f05c37d7e708231d4149625/html5/thumbnails/35.jpg)
Any Questions?