breaking the clustering limits @ alphacsp javaedge 2007
TRANSCRIPT
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 2
Breaking the Breaking the Clustering LimitsClustering Limits
Baruch SadogurskyConsultant, AlphaCSP
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 3
Cluster at NASACluster at NASA
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 4
AgendaAgenda
• Clustering Definition– Why Clustering?
• Evolution of Clustering in Java• Grids• Implementations• Other Solutions
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 5
Clustering DefinitionClustering Definition
• Group of tightly coupled computers • Work together closely• Viewed as single computer• Commonly connected through fast
LANs
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 6
MotivationMotivation
• Deployed to improve– Scalability & load-balancing
• Throughput (e.g. hits per second)
– Fail-over• Availability (e.g. 99.999%)
– Resource virtualization
• Much more cost-effective than single computers
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 7
Why Clustering?Why Clustering?
•Why not single machine?
–Moore’s Law is dead
•Why not adding CPUs?–Threads, locks and context switches are expensive
•Why not via DB?–DB access sloooow–Single point of failure
•Cluster DB?
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 8
Accessing DataAccessing Data
•According to the Long Tail theory, 20% of objects used 80% of the time•We need distributed access to those 20%
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 9
Evolution of Clustering in JavaEvolution of Clustering in Java
• In the beginning there where dinosaurs application servers and J2EE programming model
• Clustering aspect never made it to the Java EE spec
• Proprietary solutions
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 10
Classical ClusteringClassical Clustering
• Replicate the state between the nodes– Provides stateful beans scalability – Provides entity beans caching– Provides HTTP session replication
• Balance the load– Smart Java client– HTTP load-balancer
• Central node manages the cluster topology– Slow detection of topology changes– New coordinator elected by voting (slow)
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 11
Coordinating the ClusterCoordinating the Cluster
• According to the Eight Fallacies of Distributed Computing:– The network is reliable– Topology doesn't change
• According to real life– Communication fails– Nodes leave and join
• Coordinator election in case of failure is expensive
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 12
Scary, scary clusteringScary, scary clustering
• “Avoid broken mirrors, Friday the 13th, multithreading and clustered stateful applications”
• Poor implementations gave clustering a bad name
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 13
Clustered Caches DrawbacksClustered Caches Drawbacks
• Copying all the data across cluster can’t provide linear scalability– More nodes you have, more copying
occurs
• Topology communication slows the cluster down
• Cache needs eviction policy to deal with stale data
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 14
Clustered Caches DrawbacksClustered Caches Drawbacks
• Operates only on simple and serializable types
• Mutated objects have to be returned to the cache
• Coarse-grained (whole object is replicated)
• Can’t handle object graphs– Serialization issue
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 15
Evolution of Clustering in JavaEvolution of Clustering in Java
• Spring, JBoss micro-container, Pico container and others brought the POJO to enterprise world
• The rise of the POJO standardized the clustering services
• Clustering market is on fire
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 16
From Cache to Grid ComputingFrom Cache to Grid Computing
• “Caches” are out, “Grids” are in…• So what is “Grid Computing”?• There is no technology called "Grid
Computing“
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 17
From Cache to Grid ComputingFrom Cache to Grid Computing
• Definition of set of distributed computing use cases that have certain technical aspects in common– Data Grids– Computational Grids– On-Demand Grids
• First two are relevant for Java Enterprise applications clustering– On-Demand Grid is about leasing computing
time
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 18
Grid TypesGrid Types
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 19
Data GridsData Grids
Holds bazillion terabytes
...
Each one holds only zillion terabytes
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 20
Data GridsData Grids
• Split lots of data to subsets of data• Each node gets only subset of data it
currently needs• Combine results from the different
nodes• Also natural fail-over
– State replication
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 21
Computational GridsComputational Grids
Counts i++ bazillion times
...
Each one counts i++ only zillion times
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 22
Computational GridsComputational Grids
• Split long task into multiple sub-tasks• Execute each sub-task in parallel on
a separate computer• Combine results from the sub-tasks
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 23
Functional Languages and GridsFunctional Languages and Grids
• Functional languages considered the best tool for grid programming
• Full statelessness• Isolated functions
– Get all the needed data via parameters
• Scala compiles to JVM bytecode– www.scala-lang.org
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 24
Master/WorkerMaster/Worker
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 25
Map/ReduceMap/Reduce
Read part of the data
Data source
Process and map the data
Read the mapped dataReduce the
data
ResultsWrite results
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 26
Map/Reduce ExampleMap/Reduce Example
• Input for mapping:– <data, “two witches watch two watches; which witch watch which watch?”>
• Map output (and reduce input):– <two, 1>– <witch, 1>– <watch, 1>– <two, 2>– <watch, 2>– <which, 1>– <witch, 2>– <watch, 3>– <which, 2>– <watch, 4>
• Reduce output:– <two, 2>– <witch, 2>– <watch, 4>– <which, 2>
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 27
Map/Reduce ExampleMap/Reduce Example
• Both map() and reduce() can be easily distributed, since they are stateless
• Google uses their implementation for analyzing the Internet– labs.google.com/papers/mapreduce.html
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 28
Java ComputeGrid VisionJava ComputeGrid Vision
• Sun spec for Service Oriented Architectures– www.jini.org
• Released in 1998(!) and was totally ahead its time– Didn’t make to J2EE spec and was pretty
abandoned• Basis for JavaSpaces• The concept is sending code over the wire
– Pure Java– Code executed locally
• No network exceptions during the execution
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 29
Java ComputeGrid VisionJava ComputeGrid Vision
•JavaSpaces - “Space” based technology–javaspaces.org/
•“Space” definition:–A place on the network to share and store objects
•Both data and tasks–Associative shared memory for the network–Unifies storage and communications
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 30
ImplementationsImplementations
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 31
EHCacheEHCache
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 32
EHCacheEHCache
•OpenSource–ehcache.sourceforge.net
•Fast–In-process caching–Asynchronous replication
•Small–110KB
•Simple•RMI communication•Map based API
–Inc. JCache (JSR 107) implementation
•Never released
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 33
EHCache ExampleEHCache Example
1 <cacheManagerPeerProviderFactory 2 class="net.sf.ehcache.distribution.RMICacheManagerPeerProviderFactory" 3 properties="peerDiscovery=automatic, 4 multicastGroupAddress=230.0.0.1, 5 multicastGroupPort=4446, timeToLive=1" 6 propertySeparator="," 7 />
1 <cache name="sampleDistributedCache1" 2 maxElementsInMemory="10" 3 eternal="false" 4 timeToIdleSeconds="100" 5 timeToLiveSeconds="100" 6 overflowToDisk="false"> 7 <cacheEventListenerFactory 8 class="net.sf.ehcache.distribution.RMICacheReplicatorFactory"/> 9 <bootstrapCacheLoaderFactory 10 class="net.sf.ehcache.distribution.RMIBootstrapCacheLoaderFactory"/> 11 </cache>
1 CacheManager manager = new CacheManager(); 2 Cache cache1 = manager.getCache("sampleDistributedCache1"); 3 Element elementToPut = new Element("firstName", "Biff"); 4 cache1.put(elementToPut);
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 34
GlassFish ShoalGlassFish Shoal
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 35
GlassFish ShoalGlassFish Shoal
• Backbone for GlassFish AS clustering• Open Source at dev.java.net• Can be used standalone• Group Management Service (GMS) centric• GMS Themes
– Group Sensory-Action Theme• Lifecycle notifications
– Group Communication Theme• Group communications provider SPI
– JXTA - default– Can plugin JGroups insteadGroup communications API
• Send and receive plain messages– Shared or Distributed Storage Theme
• Map implementation• Concurrent
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 36
JXTA Usage in ShoalJXTA Usage in Shoal
JXTA Implementation
GMS SPI
GMS Client API
Application
Startup and ShutdownProduce Action and Deliver Signal
Join and leaveNotify listeners
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 37
Oracle Tangosol Oracle Tangosol CoherenceCoherence
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 38
Oracle Tangosol CoherenceOracle Tangosol Coherence
• DataGrid• Fast!• Planned as clustering backbone for
Oracle AS• Can be used standalone• Commercial• Oracle product now• Single JAR
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 39
Coherence Data GridCoherence Data Grid
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 40
Oracle Tangosol CoherenceOracle Tangosol Coherence
• “Organic cluster” – all the nodes are equal• Partitioned Topology
– Every node holds subset of data• Replicated for fail-over
• Replicated Topology– Behaves like cache
• Every node holds all the data
• Fast elimination (no voting)
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 41
Oracle Tangosol CoherenceOracle Tangosol Coherence
• Supports queries and indices• Map interface implementation• Lifecycle listeners• Drawbacks
– Usual cache drawbacks– Closed source– Costly
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 42
JBoss POJO CacheJBoss POJO Cache
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 43
JBoss POJO CacheJBoss POJO Cache
• Subproject of JBossCache– Clustering backbone of JBoss AS
• OpenSource at JBoss labs– http://labs.jboss.com/jbosscache
• Transactional• Bytecode instrumented POJOs• Don’t have to be serializable
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 44
JBoss POJO CacheJBoss POJO Cache
• Fine-grained replication• Graphs are allowed• Changes detection• POJOs need to be annotated and
attached to the cache• Tree implementation• JGroups communication
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 45
JBoss POJO Cache UsageJBoss POJO Cache Usage
1 cache1.attach("students/54321", mary); 2 cache1.attach("students/65432", joe); 3 Student mary2 = (Student) cache2.find("students/54321"); 4 Student joe2 = (Student) cache2.find("students/65432");
1 /** 2 * No need for annotation here since Person 3 * has been annotated already. 4 */ 5 public class Student extends Person 6 { 7 protected String school; 8 protected Set<Course> courses = new LinkedHashSet<Course>();
1 @org.jboss.cache.pojo.annotation.Replicable 2 public class Person { 3 protected String name; 4 protected Address address;
1 XmlConfigurationParser parser = new XmlConfigurationParser(); 2 Configuration conf = parser.parseFile("META-INF/replSync-service.xml"); 3 conf.setClusterName("TestCluster"); 4 PojoCache cache = PojoCacheFactory.createCache(conf, true);
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 46
JGroups ConfigurationJGroups Configuration
1 <attribute name="ClusterConfig"> 2 <config> 3 <UDP mcast_addr="228.10.10.10" 4 mcast_port="45588" 5 tos="8" 6 ucast_recv_buf_size="20000000" 7 ucast_send_buf_size="640000" 8 mcast_recv_buf_size="25000000" 9 mcast_send_buf_size="640000" 10 loopback="false" 11 discard_incompatible_packets="true" 12 max_bundle_size="64000" 13 max_bundle_timeout="30" 14 use_incoming_packet_handler="true" 15 ip_ttl="2" 16 enable_bundling="false" 17 enable_diagnostics="true" 18 19 use_concurrent_stack="true" 20 21 thread_naming_pattern="pl" 22 23 thread_pool.enabled="true" 24 thread_pool.min_threads="1" 25 thread_pool.max_threads="25" 26 thread_pool.keep_alive_time="30000" 27 thread_pool.queue_enabled="true" 28 thread_pool.queue_max_size="10" 29 thread_pool.rejection_policy="Run" 30 31 oob_thread_pool.enabled="true" 32 oob_thread_pool.min_threads="1" 33 oob_thread_pool.max_threads="4" 34 oob_thread_pool.keep_alive_time="10000" 35 oob_thread_pool.queue_enabled="true" 36 oob_thread_pool.queue_max_size="10" 37 oob_thread_pool.rejection_policy="Run"/>
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 47
GigaSpacesGigaSpaces
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 48
GigaSpacesGigaSpaces
• JavaSpaces implementation• gigaspaces.com• OpenSpaces
– JavaSpaces implementation– Spring configuration– OpenSource
• Enterprise DataGrid– Map interface– Queries– Lifecycle listeners– Etc.– Commercial
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 49
GigaSpaces XAPGigaSpaces XAP
• XAP – eXtreme Application Platform– Kind of application server– Processing Units have strongly defined
directory structure (like container)– Total solution
• Relies on “OpenSpaces”• Commercial
– Start-ups special free license
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 50
GigaSpaces XAPGigaSpaces XAP
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 51
OpenTerracottaOpenTerracotta
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 52
OpenTerracottaOpenTerracotta
•JVM is taking care of cross-platform, garbage collection, threading, etc.•Terracotta takes clustering concern out to the JVM
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 53
OpenTerracottaOpenTerracotta
• Clustered JVM semantics• OpenSource
– terracotta.org
• Network Attached Memory– Looks like RAM to the application– Runs both in JVM level (JVM plugin) and
as separate process• Two level cache
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 54
JVM Level SimulationJVM Level Simulation
• JVM abstracts multi-platform concerns• It should also abstract multi-nodes
concerns– Terracotta adds it to the JVM
• Simulation of single JVM semantics:– Garbage collection– References– Threads synchronization– Object identity
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 55
OpenTerracottaOpenTerracotta
• Bytecode instrumentation is used to mimic JVM behavior
• Currently supports only Sun’s JVM– Support for IBM and JRockIt planned soon
• Features– Low development impact - no in-advance
clustering planning needed– Linear scalability– No APIs– Declarative - marking what is clustered– No serialization
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 56
OpenTerracotta ArchitectureOpenTerracotta Architecture
• The Client Nodes - run on a standard JVM – Terracotta is installed to the JVM
• The Terracotta Server Cluster - provides the clustering intelligence – Each server is a Java process– One Active Server – One or many Passive Servers
• Shared Storage - share the state for the passive server(s)
• Server/Client architecture considered by some as the drawback of Terracotta
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 57
Terracotta Client/ServerTerracotta Client/Server
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 58
Terracotta DemoTerracotta Demo
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 59
Other SolutionsOther Solutions
• GridGain – map/reduce computation grid– gridgain.com
• Hadoop – map/reduce Java implementation– lucene.apache.org/hadoop
• Globus Toolkit - Open Grid Services Architecture RI– globus.org
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 60
ConclusionConclusion
• Cache, Data grid, Compute grid or Clustered VM?
• Open source or commercial?• API driven or API less?• Container or JAR?
Copyright AlphaCSP Israel 2007 – The JavaEdge Seminar 61
Q&AQ&A