building highly scalable java applications on windows azure - javaone s313978
DESCRIPTION
Presentation delivered at JavaOne 2010. Talks about how to use Java to build highly scalable and reliable applications on Windows Azure.TRANSCRIPT
Building Highly Scalable Java Applications on Windows AzureDavid [email protected]/dachou
Agenda
• Overview of Windows Azure
• Java How-to
• Architecting for Scale
• What’s Next
> Introduction
What is Windows Azure?
• A cloud computing platform (as-a-service)
– on-demand application platform capabilities
– geo-distributed Microsoft data centers
– automated, model-driven services provisioning and management
• You manage code, data, content, policies, service models, etc.– not servers (unless you want to)
• We manage the platform– application containers and services, distributed storage systems
– service lifecycle, data replication and synchronization
– server operating system, patching, monitoring, management
– physical infrastructure, virtualization networking
– security
– “fabric controller” (automated, distributed service management system)
> Azure Overview
How this may be interesting to you
• Not managing and interacting with server OS– less work for you
– don’t have to care it is “Windows Server” (you can if you want to)
– but have to live with some limits and constraints
• Some level of control– process isolation (runs inside your own VM/guest OS)
– service and data geo-location
– allocated capacity, scale on-demand
– full spectrum of application architectures and programming models
• You can run Java!– plus PHP, Python, Ruby, MySQL, memcached, etc.
– and eventually anything that runs on Windows
> Azure Overview
Anatomy of a Windows Azure instance
Guest VMGuest VMGuest VMHost VMMaintenance OS,Hardware-optimized hypervisor
> Azure Overview > Anatomy of a Windows Azure instance
The Fabric Controller communicates with every server within the Fabric. It manages Windows Azure, monitors every application, decides where new applications should run – optimizing hardware utilization.
Storage – distributed storage systems that are highly consistent, reliable, and scalable.
Compute – instance types: Web Role & Worker Role. Windows Azure applications are built with web role instances, worker role instances, or a combination of both.
Each instance runs on its own VM (virtual machine) and local transient storage; replicated as needed
HTTP/HTTPS
Java and Windows Azure
• Provide your JVM– any version or flavor that runs on Windows
• Provide your code– no programming constraints (e.g., whitelisting libraries, execution time limit,
multi-threading, etc.)
– use existing frameworks
– use your preferred tools (Eclipse, emacs, etc.)
• File-based deployment– no OS-level installation (conceptually extracting a tar/zip with run.bat)
• Windows Azure “Worker Role” sandbox– standard user (non-admin privileges; “full trust” environment)
– native code execution (via launching sub-processes)
– service end points (behind VIPs and load balancers)
> Java How-To
Some boot-strapping in C#
• Kick-off process in WorkerRole.run()– get environment info (assigned end point ports, file locations)
– set up local storage (if needed; for configuration, temp files, etc.)
– configure diagnostics (Windows Server logging subsystem for monitoring)
– launch sub-process(es) to run executable (launch the JVM)
• Additional hooks (optional)
– Manage role lifecycle
– Handle dynamic configuration changes
• Free tools– Visual Studio Express
– Windows Azure Tools for Visual Studio
– Windows Azure SDK
> Java How-To > Boot-strapping
Running Tomcat in Windows Azure
Service Instance
Service Instance
Worker Role
RoleEntry Point
Sub-Process
JVM
Tomcat
server.xmlCatalina
Fabric Controller
Load Balancer
TableStorage
BlobStorage
Queue
ServiceBus
Access Control
SQL Database
new Process()
bind port(x)
htt
p:/
/in
stan
ce:x
htt
p:/
/in
stan
ce:y
listen port(x)
http://app:80
getruntimeinfo
index.jsp
> Java How-To > Tomcat
• Boot-strapping code in WorkerRole.run()
• Service end point(s) in ServiceDefinition.csdef
Running Jetty in Windows Azure
> Java How-To > Jetty
string response = ""; try { System.IO.StreamReader sr; string port = RoleEnvironment.CurrentRoleInstance.InstanceEndpoints["HttpIn"].IPEndpoint.Port.ToString(); string roleRoot = Environment.GetEnvironmentVariable("RoleRoot"); string jettyHome = roleRoot + @"\approot\app\jetty7"; string jreHome = roleRoot + @"\approot\app\jre6"; Process proc = new Process(); proc.StartInfo.UseShellExecute = false; proc.StartInfo.RedirectStandardOutput = true; proc.StartInfo.FileName = String.Format("\"{0}\\bin\\java.exe\"", jreHome); proc.StartInfo.Arguments = String.Format("-Djetty.port={0} -Djetty.home=\"{1}\" -jar \"{1}\\start.jar\"", port, jettyHome); proc.EnableRaisingEvents = false; proc.Start(); sr = proc.StandardOutput; response = sr.ReadToEnd(); } catch (Exception ex) { response = ex.Message; Trace.TraceError(response); }
<Endpoints> <InputEndpoint name="HttpIn" port="80" protocol="tcp" /> </Endpoints>
Current constraints
Platform– Dynamic networking
• <your app>.cloudapp.net• no naked domain• CNAME re-direct from custom
domain• sending traffic to loopback
addresses not allowed and cannot open arbitrary ports
– No OS-level access
– Non-persistent local file system• allocate local storage directory• read-only: Windows directory,
machine configuration files, service configuration files
– Available registry resources• read-only: HKEY_CLASSES_ROOT,
HKEY_LOCAL_MACHINE, HKEY_USERS, HKEY_CURRENT_CONFIG
• full access: HKEY_CURRENT_USER
> Java How-To > Limitations
Java– Sandboxed networking
• NIO (java.nio) not supported• engine and host-level clustering• JNDI, JMS, JMX, RMI, etc.• need to configure networking
– Non-persistent local file system• logging, configuration, etc.
– REST-based APIs to services• Table Storage – schema-less
(noSQL)• Blob Storage – large files
(<200GB block blobs; <1TB page blobs)
• Queues• Service Bus• Access Control
Web Applications massive scale infrastructure
burst & overflow capacity
temporary, ad-hoc sites
Service Applications composite applications
mobile/client connected services
Web API’s
Hybrid Applications component services
distributed processing
distributed data
external storage
Media Applications CGI rendering
content transcoding
media streaming
Information Sharing reference data
common data repositories
knowledge discovery & management
Collaborative Processes multi-enterprise integration
B2B & e-commerce
supply chain management
health & life sciences
domain-specific services
> Azure Overview > Ideal Scenarios
What’s this good for?
Facebook (2009)
• +200B pageviews /month
• >3.9T feed actions /day
• +300M active users
• >1B chat mesgs /day
• 100M search queries /day
• >6B minutes spent /day (ranked #2 on Internet)
• +20B photos, +2B/month growth
• 600,000 photos served /sec
• 25TB log data /day processed thru Scribe
• 120M queries /sec on memcache
> Architecting for Scale
Size matters
Twitter (2009)
• 600 requests /sec
• avg 200-300 connections /sec; peak at 800
• MySQL handles 2,400 requests /sec
• 30+ processes for handling odd jobs
• process a request in 200 milliseconds in Rails
• average time spent in the database is 50-100 milliseconds
• +16 GB of memcached
Google (2007)
• +20 petabytes of data processed /day by +100K MapReduce jobs
• 1 petabyte sort took ~6 hours on ~4K servers replicated onto ~48K disks
• +200 GFS clusters, each at 1-5K nodes, handling +5 petabytes of storage
• ~40 GB /sec aggregate read/write throughput across the cluster
• +500 servers for each search query < 500ms
• >1B views / day on Youtube (2009)
Myspace (2007)
• 115B pageviews /month
• 5M concurrent users @ peak
• +3B images, mp3, videos
• +10M new images/day
• 160 Gbit/sec peak bandwidth
Flickr (2007)
• +4B queries /day
• +2B photos served
• ~35M photos in squid cache
• ~2M photos in squid’s RAM
• 38k req/sec to memcached (12M objects)
• 2 PB raw storage
• +400K photos added /day
app server
• Common characteristics– synchronous processes
– sequential units of work
– tight coupling
– stateful
– pessimistic concurrency
– clustering for HA
– vertical scaling
Traditional scale-up architecture
> Architecting for Scale > Vertical Scaling
app serverweb data store
units of work
web data store
app server
app server
Traditional scale-up architecture
> Architecting for Scale > Vertical Scaling
web
data storeweb
• To scale, get bigger servers– expensive
– has scaling limits
– inefficient use of resources
data storeapp server
Traditional scale-up architecture
> Architecting for Scale > Vertical Scaling
app serverweb
web
• When problems occur– bigger failure impact
Traditional scale-up architecture
> Architecting for Scale > Vertical Scaling
app serverweb
data storeweb
• When problems occur– bigger failure impact
– more complex recovery
Use more pieces, not bigger pieces
LEGO 10179 Ultimate Collector's Millennium Falcon• 33 x 22 x 8.3 inches (L/W/H)• 5,195 pieces
LEGO 7778 Midi-scale Millennium Falcon• 9.3 x 6.7 x 3.2 inches (L/W/H) • 356 pieces
> Architecting for Scale > Horizontal scaling
app server
Scale-out architecture
> Architecting for Scale > Horizontal scaling
app serverweb data store
• Common characteristics– small logical units of work
– loosely-coupled processes
– stateless
– event-driven design
– optimistic concurrency
– partitioned data
– redundancy fault-tolerance
– re-try-based recoverabilityweb data store
app server
app server
app server
app server
app server
Scale-out architecture
> Architecting for Scale > Horizontal scaling
app serverweb data store
web
web
web data store
web
web
• To scale, add more servers– not bigger servers
data store
data store
data store
data store
app server
app server
app server
app server
app server
app server
Scale-out architecture
> Architecting for Scale > Horizontal scaling
web data store
web
web
web data store
web
web
• When problems occur– smaller failure impact
– higher perceived availability
data store
data store
data store
data store
app server
app server
app server
Scale-out architecture
> Architecting for Scale > Horizontal scaling
app serverweb data store
web
web app server
web data store
web
web
• When problems occur– smaller failure impact
– higher perceived availability
– simpler recovery
data store
data store
data store
data store
app server
app server
• Scalable performance at extreme scale– asynchronous processes
– parallelization
– smaller footprint
– optimized resource usage
– reduced response time
– improved throughput
app server
app server
Scale-out architecture + distributed computing
> Architecting for Scale > Horizontal scaling
app serverweb data store
web
web app server
web data store
web
web
data store
data store
data store
data store
parallel tasks
async tasks
perceived response time
app server
app server
• When problems occur– smaller units of work
– decoupling shields impact
app server
app server
Scale-out architecture + distributed computing
> Architecting for Scale > Horizontal scaling
app serverweb data store
web
web app server
web data store
web
web
data store
data store
data store
data store
app server
• When problems occur– smaller units of work
– decoupling shields impact
– even simpler recovery
app server
app server
Scale-out architecture + distributed computing
> Architecting for Scale > Horizontal scaling
app serverweb data store
web
web app server
web data store
web
web
data store
data store
data store
data store
Live Journal (from Brad Fitzpatrick, then Founder at Live Journal, 2007)
> Architecting for Scale > Cloud Architecture Patterns
Partitioned Data
DistributedCache
Web Frontend
Distributed Storage
Apps & Services
Flickr (from Cal Henderson, then Director of Engineering at Yahoo, 2007)
> Architecting for Scale > Cloud Architecture Patterns
Partitioned Data DistributedCache
Web Frontend
Distributed Storage
Apps & Services
SlideShare (from John Boutelle, CTO at Slideshare, 2008)
> Architecting for Scale > Cloud Architecture Patterns
Partitioned Data
Distributed Cache
WebFrontend
Distributed Storage
Apps &Services
Twitter (from John Adams, Ops Engineer at Twitter, 2010)
> Architecting for Scale > Cloud Architecture Patterns
PartitionedData
DistributedCache
WebFrontend
DistributedStorage
Apps &Services
Queues
AsyncProcesses
2010 stats (Source: http://www.facebook.com/press/info.php?statistics)
– People• +500M active users• 50% of active users log on in any given
day• people spend +700B minutes /month
– Activity on Facebook• +900M objects that people interact with• +30B pieces of content shared /month
– Global Reach• +70 translations available on the site• ~70% of users outside the US• +300K users helped translate the site
through the translations application
– Platform• +1M developers from +180 countries• +70% of users engage with
applications /month• +550K active applications• +1M websites have integrated with
Facebook Platform • +150M people engage with Facebook on
external websites /month
> Architecting for Scale > Cloud Architecture Patterns
Facebook(from Jeff Rothschild, VP Technology at Facebook, 2009)
PartitionedData
DistributedCache
WebFrontend
DistributedStorage
Apps &Services
ParallelProcesses
AsyncProcesses
Fundamental concepts
> Architecting for Scale
• Vertical scaling still works
Fundamental concepts
> Architecting for Scale
• Horizontal scaling for cloud computing
• Small pieces, loosely coupled
• Distributed computing best practices– asynchronous processes (event-driven design)
– parallelization
– idempotent operations (handle duplicity)
– de-normalized, partitioned data (sharding)
– shared nothing architecture
– optimistic concurrency
– fault-tolerance by redundancy and replication
– etc.
Asynchronous processes & parallelization
> Architecting for Scale > Fundamental Concepts
Defer work as late as possible– return to user as quickly as
possible
– event-driven design (instead of request-driven)
Cloud computing friendly– distributes work to more servers
(divide & conquer)
– smaller resource usage/footprint
– smaller failure surface
– decouples process dependencies
Windows Azure platform services
– Queue Service
– AppFabric Service Bus
– inter-node communicationWorker Role
Web Role
Queues
Service BusWeb Role
Web Role
Web Role
Worker Role
Worker Role
Worker Role
Partitioned data
> Architecting for Scale > Fundamental Concepts
Shared nothing architecture– transaction locality (partition
based on an entity that is the “atomic” target of majority of transactional processing)
– loosened referential integrity (avoid distributed transactions across shard and entity boundaries)
– design for dynamic redistribution and growth of data (elasticity)
Cloud computing friendly– divide & conquer
– size growth with virtually no limits
– smaller failure surface
Windows Azure platform services
– Table Storage Service
– SQL Azure
Web Role
QueuesWeb Role
Web Role
Worker Role
Relational Database
Relational Database
Relational Database
Web Role
read
write
Idempotent operations
> Architecting for Scale > Fundamental Concepts
Repeatable processes– allow duplicates (additive)
– allow re-tries (overwrite)
– reject duplicates (optimistic locking)
– stateless design
Cloud computing friendly– resiliency
Windows Azure platform services
– Queue Service
– AppFabric Service Bus
Worker Role
Service Bus Worker Role
Worker Role
At most two of these properties for any shared-data system
“Towards Robust Distributed Systems”, Dr. Eric A. Brewer, UC Berkeley
C A
P
Consistency + Availability • High data integrity• Single site, cluster database, LDAP, xFS file
system, etc.• 2-phase commit, data replication, etc.
C A
P
Consistency + Partition • Distributed database, distributed locking, etc.• Pessimistic locking, minority partition
unavailable, etc.
C A
P
Availability + Partition • High scalability• Distributed cache, DNS, etc.• Optimistic locking, expiration/leases, etc.
CAP (Consistency, Availability, Partition) Theorem
> Architecting for Scale > Fundamental Concepts
Hybrid architectures
> Architecting for Scale > Fundamental Concepts
Scale-out (horizontal)– BASE: Basically Available, Soft
state, Eventually consistent
– focus on “commit”
– conservative (pessimistic)
– shared nothing
– favor extreme size
– e.g., user requests, data collection & processing, etc.
Scale-up (vertical)– ACID: Atomicity, Consistency,
Isolation, Durability
– availability first; best effort
– aggressive (optimistic)
– transactional
– favor accuracy/consistency
– e.g., BI & analytics, financial processing, etc.
Most distributed systems employ both approaches
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Thank you!
David [email protected]/dachou
• Characteristics– On-demand self-service
– Broad network access
– Resource pooling
– Rapid elasticity
– Measured service
• Service models– Software as a service
– Platform as a service
– Infrastructure as a service
• Deployment models– Private cloud
– Community cloud
– Public cloud
– Hybrid cloud
Cloud computing
Source: The NIST Definition of Cloud Computing, Version 15, 2009.10.07, Peter Mell and Tim Grance http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc
“Cloud computing is a model for
enabling convenient, on-demand
network access to a shared pool of
configurable computing resources
(e.g., networks, servers, storage,
applications, and services) that can be
rapidly provisioned and released
with minimal management effort or
service provider interaction. This cloud
model promotes availability and is
composed of five essential
characteristics, three service models,
and four deployment models.”
> Introduction
YourOwnData
Center
SomeoneElse’sData
Center
Host (software, database, etc.)
Use (services, information, etc.)
Build (applications, data, etc.)
Private Cloud
Public Cloud
Infrastructure (as-a-service)
Software (as-a-service)
Platform (as-a-service)
Dedicated
Hybrid Cloud
CommunityServ
ice D
eliv
ery
Mod
els
Cloud Deployment Models
(On-Premise)
Infrastructure
(as a Service)
Platform
(as a Service)
Storage
Servers
Networking
O/S
Middleware
Virtualization
Data
Applications
Runtime
Storage
Servers
Networking
O/S
Middleware
Virtualization
Data
Applications
Runtime
You m
anag
e
Man
ag
ed b
y v
en
dor
Man
ag
ed b
y v
en
dor
You m
anag
e
You m
anag
e
Storage
Servers
Networking
O/S
Middleware
Virtualization
Applications
Runtime
Data
Software
(as a Service)
Man
ag
ed b
y v
en
dor
Storage
Servers
Networking
O/S
Middleware
Virtualization
Applications
Runtime
Data
Service delivery models
> Introduction
Globally Distributed Data Centers
Quincy, WA Chicago, IL San Antonio, TX Dublin, Ireland Generation 4 DCs
User
Private Cloud
Public Cloud Services
Table StorageService
Blob StorageService
QueueService
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
Web Svc(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
Jobs(Worker
Role)
SilverlightApplication
Web Browser
MobileBrowser
WPFApplication
Service Bus
Access Control Service
WorkflowService
UserData
ApplicationData
Reference Data
Cloud Web Application
Enterprise Data
Enterprise Web Svc
Enterprise Application
DataService
StorageService
IdentityService
ApplicationService
Enterprise Identity
User
Private Cloud
Public Services
Table StorageService
Blob StorageService
QueueService
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
Web Svc(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
Jobs(Worker
Role)
SilverlightApplication
Web Browser
MobileBrowser
WPFApplication
Service Bus
Access Control Service
WorkflowService
UserData
Application Data
Reference Data
Composite Services Application
Enterprise Data
Enterprise Web Svc
Enterprise Application
DataService
StorageService
IdentityService
ApplicationService
Enterprise Identity
User
Private Cloud
Public Services
Table StorageService
Blob StorageService
QueueService
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
Web Svc(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
Jobs(Worker
Role)
SilverlightApplication
Web Browser
MobileBrowser
WPFApplication
Service Bus
Access Control Service
WorkflowService
UserData
Application Data
Reference Data
Cloud Agent Application
Enterprise Data
Enterprise Web Svc
Enterprise Application
DataService
StorageService
IdentityService
ApplicationService
Enterprise Identity
User
Private Cloud
Public Services
Table StorageService
Blob StorageService
QueueService
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
Web Svc(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
Jobs(Worker
Role)
SilverlightApplication
Web Browser
MobileBrowser
WPFApplication
Service Bus
Access Control Service
WorkflowService
UserData
Application Data
Reference Data
B2B Integration Application
Enterprise Data
Enterprise Web Svc
Enterprise Application
DataService
StorageService
IdentityService
ApplicationService
Enterprise Identity
User
Private Cloud
Public Services
Table StorageService
Blob StorageService
QueueService
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
Web Svc(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
Jobs(Worker
Role)
SilverlightApplication
Web Browser
MobileBrowser
WPFApplication
Service Bus
Access Control Service
WorkflowService
UserData
Application Data
Reference Data
Grid / Parallel Computing Application
Enterprise Data
Enterprise Web Svc
Enterprise Application
DataService
StorageService
IdentityService
ApplicationService
Enterprise Identity
User
Private Cloud
Public Services
Table StorageService
Blob StorageService
QueueService
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
Web Svc(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
ASP.NET(Web Role)
Jobs(Worker
Role)
SilverlightApplication
Web Browser
MobileBrowser
WPFApplication
Service Bus
Access Control Service
WorkflowService
UserData
Application Data
Reference Data
Hybrid Enterprise Application
Enterprise Data
Enterprise Web Svc
Enterprise Application
DataService
StorageService
IdentityService
ApplicationService
Enterprise Identity