Download - Foundation APIs and Repository Internals
1
Alfresco Repository Internals
Derek HulleySenior Engineer, Alfresco
twitter: @derekhulley
2
Alfresco Repository Internals
1. Resource Contention1. Node Creation and Modification
2. Actions
3. Scheduled Jobs
2. Transactions1. Resources
2. Implicit and Explicit
3. Using Alfresco’s Transaction Support
Agenda (1)
3
Alfresco Repository Internals
3. Navigating Hierarchies1. Lucene-based APIs
2. NodeService-based APIs
3. Walking Child Associations
4. Policies and Behaviour1. Policy Behaviour Filters
2. CopyService
Agenda (2)
4
Alfresco Repository Internals
5. Content Lifecycles1. ContentData Properties
2. Binary Files and Transactions
3. Orphaned Content
4. System Properties
6. Application Bootstrap5. Spring init
6. Lifecycle Classes
7. Modules
7. Questions
Agenda (3)
5
Resource ContentionNode Creation
• Row inserts• Read-committed isolation: Invisible until commit• Database, Caches, Lucene Indexes: Low contention
6
Resource ContentionNode Modification
• Update type, aspects, properties, ACLs; Move; Delete; etc• Invisible until commit, but can hold resource locks• Transactions rejected e.g. ConcurrencyFailureException
7
Resource Contention
Actions and Scheduled Jobs• Danger of background jobs moving ‘up’ a hierarchy
L1N1
L2N1 L2N2
L3N1 L3N1
Only one winner (at a time)
?
• Individual node modifications are serialized• Pick up small junks, commit and give way
8
Transaction
Transactions: Resources
Connection Pool
Content Binaries
Lucene Indexes
Database Rows
Thread Pools
Caches
• Each transaction requires a thread• Possibly multiple transactions on a thread
• Database row locking• ‘version’ column: optimistic locking
• Index deltas• Heavy on IO
• One transaction – one connection• Connection housekeeping has a cost
• New content binaries only• Temporary files
• Caches are transaction-aware• Conflicts drop cache entries
It pays to think about resource contention.
Replaying transactions means:• Reclaiming resources• CPU cycles• Lower response times
9
Transactions
Implicit• Defined against public service (Foundation) APIs• Bean naming convention: NodeService vs nodeService• Cost is in starting a transaction, not continuing one• public-services-context.xml and ServiceRegistry• Spring customization and interceptors
10
Transactions
Explicit• Wrap all atomic operations including groups of reads• Use RetryingTransactionHelper rather than UserTransaction
11
Transactions
Explicit: Demo: Read-only Batching
Get Stores
Get Children
Lucene Query
• Time lost to transaction initiation
• 3ms lost per low-level operation ... How much per user click?
12
Transactions
Explicit: Demo: Write Batching
• Time lost to initiate transactions. Well, yes, but ...
• Unnecessary additional indexing
index
index
CIFS and FTP
Create Node
Add Content
Get Writer
13
1
Transactions
Direct Contention: Demo
0 0 1 1 1 1 1 0 0 0
begin
log4j.logger.org.alfresco.repo.transaction.RetryingTransactionHelper=warnEg: Retrying OptimisticLockingDemo-8: count 0; wait: 0.1s; msg: "Failed to update node 14575"; exception: ...ConcurrencyFailureException
14
TransactionsAlfresco Transaction Support• RetryingTransactionHelper
• Reliable, resolves contention• “non propagating”: Transaction suspended and a new one started.• TransactionService.getRetryingTransactionHelper -> your instance
• TransactionalListener and TransactionListenerAdapter• Bind to events associated with a transaction
• AlfrescoTransactionSupport• Helper around Spring’s TransactionSynchronizationManager• getTransactionReadState: Allows logic conditional on the state of the
transaction.• bindResource and getResource: Bind objects to current transaction.
This is like ThreadLocal but is safer i.e. Resources are bound to the transaction and go away when the transaction is terminated.
• TransactionalResourceHelper• <K,V> getMap(Object resourceKey), etc: Helper to get
transactionally-bound collections.
15
Navigating HierarchiesLucene-Driven APIs• SearchService.query()
• Versatile• Not always transactionally consistent• Cluster: Transactions are replayed for indexes
SQL-Driven APIs• SearchService.selectNodes()
• Fast for simple path-based searches only: e.g.:“/app:company_home/app:data_dictionary”
• Always consistent!• NodeService.*
• Use for consistent views• FileFolderService.*
• Limited to cm:folder-based lookups• Fast lookups on cm:name (via NodeService)
16
Navigating HierarchiesWalking Child Associations• Hierarchy traversal is fast if the child associations can be isolated.• Put the correct data into createNode
P1
N2
Child Association indexes:• Parent and child• typeQName• qname• Also unique cm:name
• No uniqueness on path QName, but can be used to trim results to a meaningful few.
• Use type QName for better search selectivity.• Use child associations that have
<duplicate>false</duplicate> to enforce uniqueness (cm:name) and use FileFolderService.
17
Policies and BehaviourPolicy Behaviour Filters• Temporarily disable polices• Bound to the current transaction• Bean: “behaviourFilter”
cm:versionable• Prevent a change from forcing a new version• Force versioning on metadata change: cm:autoVersionOnUpdateProps
cm:auditable• Manually set cm:modified date• Prevent cm:modified from being recorded
18
Alfresco Repository Internals
Application Bootstrap
Server Startup• DDL script execution: lock table• Spring init(): no resources available• Alfresco Bootstrap: order of AbstractLifecycleBean• Module Bootstrap: module startup on
AbstractModuleComponent
19
Q & A