scaling hibernate

Scaling HibernateEmmanuel Bernard - Max Ross

Emmanuel Bernard

• Hibernate Search in Action

• blog.emmanuelbernard.com

• twitter.com/emmanuelbernard

Max Ross

• Google App Engine

• Hibernate Shards

What is scalability?

What is scalability?

• Users

• Resource

• Data

• Uptime

How does Hibernate stand?

• Limitations?

• SQL optimizations

• 2nd level cache

• Conversation

Node

2n

d level

cach

e

DB

Session

Session

Session

Node

2n

d level

cach

e

Session

Session

Session

Node

2n

d le

vel

cach

e

Session

Session

Session

Changes in mass

• Bulk insert / update / delete

• Stateless session

To Googolzillionsand beyond

Googolzillion things?Who are you?

• Social network

• SaaS

Problem

• Same data model

• Too much load

• Too much data

• Too many lawyers

Separating customer data

Logical separation

• All customers share tables

• Manual or Hibernate Filter

Application

SessionFactory

DB

Schema

One user per schema

• One SessionFactory per schema

• Rewrite SQL

Application

SessionFactory

DB

Schema

SessionFactory

Schema

Application

SessionFactory

DB

Schema

Schema

Use database security

• Map JAAS credentials to DB credentials

• One connection (pool) per user

Oracle security

• Oracle VPD

• Application defines active user

Storing in multiple databases

SessionFactory == DB

• Same schema across DBs

• Expensive in RAM

• Data isolated

Sharing state across SessionFactorys is probably doable

How many customer per DB?

• One

• One per schema

• Several per schema

• Dispatch customer to the right SessionFactory

Adjusting the application layer

Homogeneous nodes

• Memory

• Too many connections

• Slow to start

Application

SessionFactory

DB

Connpool

SessionFactory

DB

Connpool

Application

SessionFactory

Connpool

SessionFactory

Connpool

Application

SessionFactory

Connpool

SessionFactory

Connpool

Specialized nodes

• Load balancing rules

• Easy scalability

• Efficient resource-wise

Application

SessionFactory

Connpool

SessionFactory

DB

Connpool

Application

SessionFactory

DB

Connpool

SessionFactory

DB

Connpool

SessionFactory

DB

Connpool

Application

SessionFactory

DB

Connpool

Dispatch per user

Application

SessionFactory

Connpool

What if you need to query all your data?

Hibernate Shards

Simplified Horizontal Partitioning

• Separates app logic from federation logic

• Standard Hibernate API

• Unified view of your data

Shard Strategy

• Federation logic is application specific

• Selection

• Resolution

• Access

?

Model Object

Shard 1 Shard 2 Shard 3

??

Shard Selection

• On which shard do we create the record?

• Round robin

• Capacity based

• Attribute based

• Performance based

Shard Resolution

• On which shard do we find the record?

• Exhaustive search

• Map ID ranges to shards

• Distributed cache

Shard Access

• How do we apply operations across shards?

• Serially

• In parallel (bring your own thread pool)

• Hybrid

Writing the app is the easy part

• Operational challenges/risks are amplified

• Virtual shards can help

Virtual Shards

Application

Sharded Session Factory

Virtual

Shard 1

Virtual

Shard 2

Virtual

Shard 3

Physical

Shard 1

Virtual Shards

Application

Sharded Session Factory

Virtual

Shard 1

Virtual

Shard 2

Virtual

Shard 3

Physical

Shard 2

Physical

Shard 1

Coming Soon

• Static Data

• Full-fledged ShardedQuery

• JPA

Hibernate Search

Full-text search your domain objects

• Hibernate + Lucene

• Same programmatic model

• Index synchronized

Human queries

• Data set

• Word centric

• Typos / Synonyms

• Relevance

SQL underperforms

• Wildcard

• Table/Index full scan

• Multiple joins

• Relevance?

Customer

DBA

Full-text search

• Move load away from the DB

• Replace or complement searches

ScalabilitySymmetric cluster

• Distributed lock

• Immediate visibility

• Affects front end

Database

Lucene Directory(Index)

Hibernate +

Hibernate Search

Search requestIndex update

Hibernate +

Hibernate Search

Search requestIndex update

ScalabilityAsymmetric cluster

• Search local / change sent to master

• Asynchronous indexing (delay)

• No front end extra cost / good scalability

DatabaseHibernate +

Hibernate Search

JMS queue

Lucene Directory(Index)Master

Hibernate +

Hibernate SearchProcessIndex update

Index update order

Lucene Directory(Index)Copy

Search request

Copy

Slave

Master

Scalabilities (sic)

• Hibernate a good citizen

• Isolating customer data

• Deal with multiple databases

• Hibernate Shards

• Hibernate Search

Q&A

• For more infos

• Hibernate Search in Action

• Java Persistence with Hibernate

• Max’s podcasts• http://google-code-updates.blogspot.com/2007/08/google-developer-

podcast-episode-six.html

• http://www.javaworld.com/podcasts/jtech/2008/072408jtech.html

• hibernate.org

http://google-code-updates.blogspot.com/2007/08/google-developer-podcast-episode-six.html




http://www.javaworld.com/podcasts/jtech/2008/072408jtech.html

http://www.javaworld.com/podcasts/jtech/2008/072408jtech.html

scaling hibernate

Documents