running a megasite on microsoft technologies
DESCRIPTION
MySpace and Microsoft.com are two of the most-visited Web sites on the planet. Come to this session to hear about lessons learned using Microsoft technologies to run Web applications on a massive scale. Representatives from Microsoft.com talk about lessons learned using an all-Microsoft datacenter. Representatives from MySpace talk about the realities of using Microsoft technologies in a scalable, federated environment using SQL Server 2005, .NET 2.0 and IIS 6 on Windows Server 2003 64-bit editions. This session features an open Q&A with a panel of technical managers and engineers from MySpace and Microsoft.com.TRANSCRIPT
Running A Megasite On Running A Megasite On Microsoft TechnologiesMicrosoft Technologies
Casey JacobsCasey Jacobs Aber WhitcombAber WhitcombDirector of EngineeringDirector of Engineering CTOCTOMicrosoft.comMicrosoft.com MySpace.comMySpace.com
Chris St.AmandChris St.Amand Jim BenedettoJim Benedetto Sr. System EngineerSr. System Engineer VP of TechnologyVP of TechnologyMicrosoft.comMicrosoft.com MySpace.comMySpace.com
NGW046NGW046
AgendaAgenda
Introduction – Quick FactsIntroduction – Quick Facts
MySpace.com – Growing UpMySpace.com – Growing Up
Upcoming Technology EnablersUpcoming Technology Enablers
Open Panel Discussion Open Panel Discussion
IntroductionIntroduction
Brief History Of Microsoft.comBrief History Of Microsoft.com
Microsoft combines Web platform, ops, and
content teams
Standardization effort begins, consolidation
hosted systems
Focus on MSCOM Network Programming and campaign-
to-Web integration
Single MSCOM group formedBrand, content, site std’s, Privacy, brand compliance
Microsoft launcheswww.microsoft.com
Information & supportpublishing; hosting
Enable an innovative customer experience online & in-product
Product Info, Support, Dev / ITPro Experience, Customer Intelligence, Profile Mgmt &
Enterprise Downloads
2001
4M UUsers / day
2003
6.5M UUsers / day
1995
30k users / day
2006
17.1M UUsers / day
Microsoft.comMicrosoft.comQuick FactsQuick Facts
Infrastructure and Application FootprintInfrastructure and Application Footprint
5 Internet Data Centers & 3 CDN Partnerships5 Internet Data Centers & 3 CDN Partnerships
110 Web Sites, 1000’s App's and 2138 Databases 110 Web Sites, 1000’s App's and 2138 Databases
80+ Gigabit/sec Bandwidth80+ Gigabit/sec Bandwidth
Solutions at High ScaleSolutions at High Scale
www.Microsoft.com www.Microsoft.com 13M UUsers/Day & 70M Page Views/Day13M UUsers/Day & 70M Page Views/Day
10K Req/Sec, 300K CC Conn’s on 80 Servers10K Req/Sec, 300K CC Conn’s on 80 Servers
350 Vroots, 190 IIS Web App’s & 12 App Pools350 Vroots, 190 IIS Web App’s & 12 App Pools
Microsoft UpdateMicrosoft Update250M UScans/Day, 12K ASP.NET Req/Sec, 1.1M ConCurrent 250M UScans/Day, 12K ASP.NET Req/Sec, 1.1M ConCurrent
28.2 Billion Downloads for CY 200528.2 Billion Downloads for CY 2005
Egress – MS, Akamai & Savvis (30-80+ Gbit/Sec)Egress – MS, Akamai & Savvis (30-80+ Gbit/Sec)
MySpace Company OverviewMySpace Company OverviewLaunched Sept, 2003Launched Sept, 2003Latest as of February 2006Latest as of February 2006
64+ MM Registered Users64+ MM Registered Users38 MM UUsers & 2.3M 38 MM UUsers & 2.3M ConcurrentConcurrent260K New Registered 260K New Registered Users/DayUsers/Day23 Billion Page* Views/Month23 Billion Page* Views/Month
DemographicsDemographics50.2% Female / 49.8% Male50.2% Female / 49.8% MalePrimary Age Demo: 14-34Primary Age Demo: 14-34
Site TrendsSite Trends260K New Users/Day260K New Users/Day430M Total Images 430M Total Images Millions of Songs Streamed/DayMillions of Songs Streamed/Day1000’s of New MP3’s/Day1000’s of New MP3’s/Day20 Million Comments Posted20 Million Comments Posted
Media Metrix February 2006 Audience RankingsMedia Metrix February 2006 Audience Rankings
Source comScore Media Metrix February - 2006
Internet RankInternet Rank Pageviews in ‘000sPageviews in ‘000s
YahooYahoo #1#1 29,50829,508
MySpaceMySpace #2#2 23,56623,566
MSNMSN #3#3 14,69514,695
EbayEbay #4#4 9,6329,632
GoogleGoogle #5#5 7,3297,329
HotmailHotmail #6#6 6,8126,812
MySpace.com MySpace.com Quick FactsQuick Facts
Infrastructure and Application FootprintInfrastructure and Application Footprint3 Internet Data Centers3 Internet Data CentersServer BreakdownServer Breakdown
2682 Web and 650 Database Servers2682 Web and 650 Database Servers90 Cache Servers 16gb RAM90 Cache Servers 16gb RAM650 Dart servers650 Dart servers60 DB Servers60 DB Servers150 Media servers150 Media servers
3000 disks in SAN architecture3000 disks in SAN architectureEgress ManagementEgress Management
17,000 mb/s bandwidth17,000 mb/s bandwidth15,000 mb/s on CDN15,000 mb/s on CDN
MySpace.comMySpace.com
Growing up in the Internet WorldGrowing up in the Internet World
0 users0 usersThe beginningThe beginning
Two tiered architectureTwo tiered architectureSingle DatabaseSingle Database
Load balanced web serversLoad balanced web servers
Great for rapid developmentGreat for rapid development
Less complexity means faster time to Less complexity means faster time to market and less operational costsmarket and less operational costs
Works for small to medium sized Works for small to medium sized websites, not big oneswebsites, not big ones
0 Users
500k Users500k UsersA Single database is not enoughA Single database is not enough
Max out a single databaseMax out a single database
Split reads and writes across separate Split reads and writes across separate databasesdatabases
Use transactional replication so Use transactional replication so multiple databases can service readsmultiple databases can service reads
500k Users
1 Million1 MillionVertical partitioningVertical partitioning
Transactional replication doesn’t work Transactional replication doesn’t work for all workloads and data typesfor all workloads and data types
Use a combination of Vertical Use a combination of Vertical Partitioning and replicationPartitioning and replication
1M Users
2 Million2 MillionSANSAN
Start to reconsider SCSI arrays for the Start to reconsider SCSI arrays for the long-termlong-termSCSI arrays have good performance SCSI arrays have good performance but reliability issuesbut reliability issuesSANS provide better performance, SANS provide better performance, uptime, and redundancyuptime, and redundancyMove to a clarion and enjoy better Move to a clarion and enjoy better these benefitsthese benefits
2M Users
3 Million3 MillionHorizontal partitioningHorizontal partitioning
Vertical Partitions see Vertical Partitions see performance problemsperformance problems
Decide we need to re-architect the Decide we need to re-architect the databasedatabase
Horizontal partitioning is the Horizontal partitioning is the answer but is difficult to do while answer but is difficult to do while in productionin production
3M Users
Horizontal PartitioningHorizontal Partitioning
All features reside on All features reside on a single database servera single database server
Data is partitioned by user IDData is partitioned by user ID
Some data cannot be partitioned Some data cannot be partitioned especially on a social networking siteespecially on a social networking site
3M Users
5 Million5 MillionNetwork bottlenecksNetwork bottlenecks
Various areas of the network Various areas of the network become saturatedbecome saturated
Gig uplinks are maxed outGig uplinks are maxed outSwitch to Autonomous network and BGPSwitch to Autonomous network and BGP
Get multiple gig links and 10G linksGet multiple gig links and 10G links
Load balancer is maxed outLoad balancer is maxed out““Must load balance the load balancers”Must load balance the load balancers”
Use DNSUse DNS 5M Users
7 Million7 MillionSite dependenciesSite dependencies
Separating features on the front end Separating features on the front end isolates potential bottlenecksisolates potential bottlenecks
Using subdomains Using subdomains is easiest wayis easiest way
7M Users
10 Million10 MillionScalable storageScalable storage
Trying to partition storage on the Trying to partition storage on the backend is time consuming and backend is time consuming and inefficientinefficient
Maxing out SANs is very costlyMaxing out SANs is very costly
We realize scalable storage is keyWe realize scalable storage is key
10M Users
15 Million15 MillionDB’s versus CachingDB’s versus Caching
Databases still having perf issuesDatabases still having perf issuesDatabases are expensiveDatabases are expensive
Have a lot of transactional overheadHave a lot of transactional overhead
Caching tierCaching tierHigh speed cache is perfect for readsHigh speed cache is perfect for reads
LRU algorithm is self managingLRU algorithm is self managing
Drastically reduces database loadDrastically reduces database load
MySpaceMySpaceWhere we are todayWhere we are today
Upcoming Technology Upcoming Technology EnablersEnablers
What’s Next for Microsoft.com and What’s Next for Microsoft.com and MySpace.com?MySpace.com?
SQL Server 2005SQL Server 2005Product technology enablersProduct technology enablers
Peer-To-Peer ReplicationPeer-To-Peer ReplicationSystem & Data Center AutonomySystem & Data Center Autonomy
Zero “perceived” Application Downtime Zero “perceived” Application Downtime from Consumersfrom Consumers
Eliminates Single Point of Failure for R/W Eliminates Single Point of Failure for R/W DatabasesDatabases
Mirroring (SP1)Mirroring (SP1)Targeting Replacement of Log Shipping Fail-Targeting Replacement of Log Shipping Fail-Over pairsOver pairs
3 Systems in TAP Program (Technet, 3 Systems in TAP Program (Technet, Learning & Genuine) Learning & Genuine)
Reduced Failover DowntimeReduced Failover DowntimeLog Shipping: 5-15min AvgLog Shipping: 5-15min Avg
Mirroring < 1min (planned)Mirroring < 1min (planned)
Table PartitioningTable PartitioningReduced Storage CostsReduced Storage Costs
Scale Up at Lower CostsScale Up at Lower Costs Data Center A
Database Mirroring
Web Cluster 1 Web Cluster 2
Principle Mirror
Sync / Async
Transactions
Data Center BData Center A
ICPSQL.PHX.GBL
Peer-To-Peer Replication
SQL A SQL B SQL C SQL D
NLB VIP 1 NLB VIP 2
MySpaceMySpaceScaling SQL ServerScaling SQL Server
V1: Single Instance – < 1 Million UsersV1: Single Instance – < 1 Million UsersSingle SQL Server Instance Supports All Users and FeaturesSingle SQL Server Instance Supports All Users and Features
V2: Single Instance Replicating to Read Only V2: Single Instance Replicating to Read Only Full Copies < 2 Million UsersFull Copies < 2 Million Users
Single server handles all write transactions, read Single server handles all write transactions, read transactions spread across multiple transactional transactions spread across multiple transactional replication copiesreplication copies
V3: Vertical Partitioning - < 4 Million UsersV3: Vertical Partitioning - < 4 Million UsersEach Feature/Page of the site on its own SQL ServerEach Feature/Page of the site on its own SQL Server
MySpaceMySpaceScaling SQL ServerScaling SQL Server
V4: Horizontal Partitioning - < 8 Million UsersV4: Horizontal Partitioning - < 8 Million UsersAll features/pages brought back to single database schemaAll features/pages brought back to single database schema
Standard schema across all databasesStandard schema across all databases
User ranges partitioned across databasesUser ranges partitioned across databases
V5: Horizontally Partitioned Core with Replicated V5: Horizontally Partitioned Core with Replicated Content, Vertically Partitioned Features Databases, Content, Vertically Partitioned Features Databases, “Shared Content” Databases - > 8 Million Users“Shared Content” Databases - > 8 Million Users
Primary Myspace schema exists across large farm of servers Primary Myspace schema exists across large farm of servers
Small amounts of content replicated to all horizontally partitioned Small amounts of content replicated to all horizontally partitioned servers to allow for features spanning all servers to allow for features spanning all user rangesuser ranges
V6: Migration to SQL Server 2005 - >26 Million UsersV6: Migration to SQL Server 2005 - >26 Million Users
SQL Server 2005SQL Server 200564 bit64 bit
Memory Pressure under 4GB 32 Limit Memory Pressure under 4GB 32 Limit Servers loaded with 32Gigs of RAMServers loaded with 32Gigs of RAM
<4 Gig Addressable to the memory pools we were <4 Gig Addressable to the memory pools we were stressingstressing
ManifestationsManifestationsConnection TimeoutsConnection Timeouts
Servers going “dark”, requiring restartServers going “dark”, requiring restart
Rejected ConnectionsRejected Connections
Problem Eliminated on 64bit ArchProblem Eliminated on 64bit Arch Connection/Sort memory pools now able to Connection/Sort memory pools now able to address all 32Gigs of RAMaddress all 32Gigs of RAM
Virtualizing StorageVirtualizing Storage
What is it?What is it?Software layer between your disks & hostsSoftware layer between your disks & hosts
AdvantagesAdvantagesProvisioning is very simple, makes capacity Provisioning is very simple, makes capacity planning more predictableplanning more predictable
Much better performanceMuch better performance
Can easily add more capacity to a LUNCan easily add more capacity to a LUN
What do we use?What do we use?3par3par
14 week bake off14 week bake off
Longhorn And IIS 7.0Longhorn And IIS 7.0Product technology enablersProduct technology enablers
UNC Content StoreUNC Content StoreSimplified Content MgmtSimplified Content Mgmt
Reduced Disk FootprintReduced Disk Footprint
File Replication (DC to DC)File Replication (DC to DC)Latent/Long links improved 80X Latent/Long links improved 80X (10Mbps vs 850Mbps)(10Mbps vs 850Mbps)
Enabler of Geo-Hosting OptionsEnabler of Geo-Hosting Options
Centralized IIS Config’sCentralized IIS Config’sCopy “Host-Host” capabilityCopy “Host-Host” capability
Eliminate complex scripting of meta-Eliminate complex scripting of meta-base & config’sbase & config’s
Dynamic Content CompressionDynamic Content CompressionFurther reduced EgressFurther reduced Egress
Improved Web Perf DeliveryImproved Web Perf Delivery
Data Center A
UNC
UNC Content Store
DFS Replication
Data Center B
UNC
Web Cluster Web Cluster Web Cluster
DFS Replication
File StoreBackup Backup
Web Cluster
File Store
Data Center A Data Center B
Content Replication
High Bandwidth File Replication for Content Sync, Peer-
to-Peer, Log Shipping
IIS 7.0IIS 7.0Failed Request TracingFailed Request Tracing
Objective – Enable Targeted Release of App’s and Content
Avoid demographic support spikes and further align to marketing campaigns
Microsoft Confidential. © 2006 Microsoft Corporation. All rights reserved. This presentation is for internal Microsoft use only.
Akamai Edgesuite
US Users(NYC, LA, DC)
Taiwanese Users
Polish Users
Allother users
Policy:Suppress
WGA Release
Policy:Release
WGA at 8% per day
Policy:Release
WGA at 2% per day
Policy:Release
WGA at 5% per day
Broadband Users
Narrowband Users
Easy to reach – regulate
as needed
Hard to reach – NEVER regulate
Sensitivity to Time/Frequency of customer online experiences
Improve ability to reach last 30% of client population
Geo-Targeting SolutionsGeo-Targeting SolutionsDemographic managementDemographic management
Open Panel DiscussionOpen Panel Discussion
© 2006 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.