efficient content structures and queries in crx/cq
TRANSCRIPT
![Page 1: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/1.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Efficient content structures and queries in CRX/CQ Marcel Reutegger | Senior Software Engineer
1
![Page 2: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/2.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Agenda
Repository storage basics
Efficient content structures
Query analysis and optimization
2
![Page 3: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/3.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Repository storage basics
Nodes & properties stored in one entity -> bundle
Every node/bundle has a UUID (random)
Child nodes are linked from the parent node
Binaries go into the DataStore
3
![Page 4: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/4.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Repository storage basics
Bundle structure
4
Bundle
UUID
Parent UUID
Properties
Child node
references
Name / Value
Name / Value
Name / Value
Name / UUID
Name / UUID
Name / UUID
Name / UUID
Name / UUID
![Page 5: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/5.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Repository storage basics – TarPM
Nodes & Properties (bundles) stored in tar files
Tar files are append only
Data is never overwritten
Garbage is removed by TarPM optimization (scheduled, incremental)
5
![Page 6: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/6.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Efficient content structures
Number of nodes
6
![Page 7: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/7.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Number of nodes
Increasing number of nodes affects performance
Random UUIDs cause random I/O -> Jackrabbit design
15k rpm drive: 200-400 IOPS
Tar index file sizes (64 bytes per bundle)
1 million nodes: 70 MB
10 million nodes: 700 MB
100 million nodes: 7 GB
7
![Page 8: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/8.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Number of nodes
How to reduce number of nodes
Use version purge tool
Remove archived workflow instances
Purge audit events
Application specific
Bad: document view ‘import’ of XML
Good: Pack properties on few nodes
Other benefits: DataStore GC will be faster
8
![Page 9: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/9.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Efficient content structures
Number of child nodes
9
![Page 10: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/10.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Number of child nodes
Frequently asked questions:
«What is the maximum supported number of child nodes?»
«I have X number of child nodes. Will performance be OK?»
10
![Page 11: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/11.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Number of child nodes
Frequently asked questions:
«What is the maximum supported number of child nodes?»
«I have X number of child nodes. Will performance be OK?»
It depends!
11
![Page 12: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/12.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Number of child nodes
Maximum number of child nodes
12
Bundle
UUID
Parent UUID
Properties
Child node
references
Name / Value
Name / Value
Name / Value
Name / UUID
Name / UUID
Name / UUID
Name / UUID
Name / UUID
Heap is
the limit
![Page 13: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/13.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Number of child nodes
Adding a single child node
13
![Page 14: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/14.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Number of child nodes
Large number of child nodes
OK for:
Static content
/libs/wcm/core/i18n/de has ~8k child nodes
Not OK for:
Dynamic content
E.g. user generated content
14
![Page 15: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/15.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Number of child nodes - Recommendations
Structure content
E.g. date/time based: 2012/09/26
Use utilities like Jackrabbit BTreeManager
Keep number of child nodes within limits (e.g. 1000)
Save in batches when possible
15
![Page 16: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/16.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Query analysis & optimization
16
![Page 17: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/17.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Query analysis and optimization
Query debug log
http://dev.day.com/kb/home/Crx/Troubleshooting/HowToDebugJCRQueries.html
“executed in <time> ms. (<query>)”
JMX (CQ 5.5)
QueryStat: slow and most frequent queries
TimeSeries: count, duration, average
17
![Page 18: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/18.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Query analysis and optimization
Fast: simple comparison
sling:resourceType = ‘my/type’
Fast: node type match
//element(*, nt:hierarchyNode)
Fast: simple fulltext search
jcr:contains(@jcr:title, ‘crx’)
Fast: like on few distinct values
jcr:like(@jcr:mimeType, ‘%/plain’)
18
![Page 19: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/19.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Query analysis and optimization
Slow: jcr:contains with initial wildcard
jcr:contains(., ‘*rabbit’)
Alternative: don’t do it, unless you know exactly what you are doing!
Slow: jcr:like on many distinct values
jcr:like(@email, ‘%@gmail.com’)
Alternative: store data you want to query in separate property,
then you can write: @email-host = ‘gmail.com’
19
![Page 20: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/20.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Query analysis and optimization
Slow: ranges matching many distinct values
@jcr:lastModified > xs:dateTime(‘2001-09-17T18:17:13.000+02:00')
Alternative: reduce resolution (e.g. only store date and not time)
20
![Page 21: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/21.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Query analysis and optimization - Recommendations
Test with real content
Structure content to avoid queries
Denormalize
Avoid path constraints
21
![Page 22: Efficient content structures and queries in CRX/CQ](https://reader030.vdocuments.us/reader030/viewer/2022032716/55b62df8bb61ebcc328b4658/html5/thumbnails/22.jpg)
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.