solr and image module extensions of magnolia
DESCRIPTION
Solr Search Engine Integration We have made some changes to the Magnolia Solr module which will be highlighted. These include: full multi-site support, support for multiple Solr instances, control over which pages to index by using template configurations and Solr document field configurations. The result is a fully configurable module that is easy to maintain. After finishing up out leftover to dos we hope to publish the module to the Magnolia Forge. Parameter-Based Image Transformations As we are becoming more and more focused on creating responsive web designs that scale well across various view ports we are experiencing a proliferation of image variations and increasingly complex frontend code to switch between them. In our previous CMS we could create image transformations with request parameters, and we decided to introduce that feature to Magnolia. The implementation and design decisions will be discussed. Filesystem Image Variation Caching Magnolia's Imaging module uses the JCR imaging workspace to cache rendered image variations. This has two disadvantages: performance and a larger backup; and no advantages that we are aware of. So we have created a file-based image cache by creating a custom ImageStreamer implementation. The file system path is equal to the JCR path for caching images: the path of the image node plus a reference to the site defenition and the variation name. Because the Imaging servlet currently does not allow you to configure which ImageStreamer instance you want to use for serving cached images, we created our own version of the servlet that uses our own ImageStreamer version. We've been using this for some time now and image variations are served noticeably faster, while our backup is significantly smaller.TRANSCRIPT
Magnolia Solr Module improvements
• Multi site support.• Solr Cloud support.• Asynchronous indexing.• Improved way to configure which pages are indexed.• Template based boosting modifier.• Flexible page type resolving mechanism.• Search result: page type to css mapping.• Various solr document field configuration enhancements:
o Multi value flag to match solr document schema.o Added pluggable system for converting field data to solr
document (Adders).• Facets.• Fake facet for period filtering.
Feature overview
Multi site support• Any number of named configurations.
• Link a site to a specific configuration.
• Admin central solr page updated to trigger deleting all documents of a specific site.
Solr cloud support
Asynchronous indexing
• Indexing is not part of the workflow.
• Creation of solr document and publication done in java.util.concurrent.ExecutorService.
• Faster activation.
• No error when indexing fails.
• Should be configurable.
Improved way to configure which pages are indexed
• Previously done with parameter on template definition.
• Two disadvantages:o No clear overview of which templates are selected
for indexing.o Not possible to configure how pages with a given
template are indexed.
• Added template configuration for templates to Website Document.
• Without this configuration pages are not indexed.
Improved way to configure which pages are indexed
Template based boosting modifier
• Property on template configuration.
• Allows you to favour pages of some type with equal score.
• Defaults to 1.0 (neutral).
Flexible page type resolving mechanism
• We want all documents to have a page type field.
• Based on circumstance page type must be resolved differently:o by path.o by templateo by some external consideration
• Introduced PageTypeResolver interface. Can be set on Template Configuration.
Flexible page type resolving mechanism
Search result: page type to css mapping
• Simple mapping of page types to css names.
• Css class names used when rendering the search result.
Field configuration: Multi value flag to match solr document schema.
• In Solr schema fields can be multi value or not.
• Inserting a document with multiple values for a single value field yields an error.
• The multi value search field configuration property ignores subsequent values for that field.
Field configuration: Pluggable system for converting field data to solr document.
• Standard values not a problem (String, Number, Date, Boolean).
• Need more control for special cases: Images, Html, categories, ..
Field configuration: Pluggable system for converting field data to solr document.
Facets
• Facets: one of the coolest features in Solr.
• Added new configuration for facet fields.
• Maps Solr field names to display field names.
• New paragraph that shows the facets and re-submits the query, narrowing the search.
Facets
Fake facet for period filtering
• Date facets have fixed time intervals.
• Code added for configuring a set of date ranges.
• Configuration option still missing.
Fake facet for period filtering
caveats:Index time boosting• Index Time Boosting is not supported by fields that omit
norms.
• The template boosting modifier creates non-standard values even for fields with no boosting configuration.
• Now you have to set 'omitNorms' to 'true' configuring those fields, so any boosting is disabled for these fields
Todo:Solr Server Configuration• The solr server instances are configured
in the repository.
• This is not nice when you have different servers for test, acceptance, production.
• Somehow externalize at least part of the configuration.
Todo:Query time boosting• Currently all boosting is index time.
• It is hard to tweak the boosting (reïndexing required).
• Query time boosting should become an option.
• Performance?
Todo:Facets and period filter• Make it part of the facet configuration
• Probably move the facet configuration out of the field configuration.
Todo:Indexing on activation• Postponed activation and deactivation not
supported.
• Indexing should be part of the work flow.
• That precludes asynchronous indexing.
Ready to share?
• Create separate module that depends on Magnolia Solr module.
• Remove or generalize some VPRO specific stuff:o Class and package names.o Custom document fields hard coded.o Remove obsolete code/features.
• Documentation