solr and image module extensions of magnolia

25
Magnolia Solr Module improvements

Upload: boris-kraft

Post on 30-Nov-2014

1.004 views

Category:

Technology


0 download

DESCRIPTION

Solr Search Engine Integration We have made some changes to the Magnolia Solr module which will be highlighted. These include: full multi-site support, support for multiple Solr instances, control over which pages to index by using template configurations and Solr document field configurations. The result is a fully configurable module that is easy to maintain. After finishing up out leftover to dos we hope to publish the module to the Magnolia Forge. Parameter-Based Image Transformations As we are becoming more and more focused on creating responsive web designs that scale well across various view ports we are experiencing a proliferation of image variations and increasingly complex frontend code to switch between them. In our previous CMS we could create image transformations with request parameters, and we decided to introduce that feature to Magnolia. The implementation and design decisions will be discussed. Filesystem Image Variation Caching Magnolia's Imaging module uses the JCR imaging workspace to cache rendered image variations. This has two disadvantages: performance and a larger backup; and no advantages that we are aware of. So we have created a file-based image cache by creating a custom ImageStreamer implementation. The file system path is equal to the JCR path for caching images: the path of the image node plus a reference to the site defenition and the variation name. Because the Imaging servlet currently does not allow you to configure which ImageStreamer instance you want to use for serving cached images, we created our own version of the servlet that uses our own ImageStreamer version. We've been using this for some time now and image variations are served noticeably faster, while our backup is significantly smaller.

TRANSCRIPT

Page 1: Solr and Image Module Extensions of Magnolia

Magnolia Solr Module improvements

Page 2: Solr and Image Module Extensions of Magnolia

• Multi site support.• Solr Cloud support.• Asynchronous indexing.• Improved way to configure which pages are indexed.• Template based boosting modifier.• Flexible page type resolving mechanism.• Search result: page type to css mapping.• Various solr document field configuration enhancements:

o Multi value flag to match solr document schema.o Added pluggable system for converting field data to solr

document (Adders).• Facets.• Fake facet for period filtering.

Feature overview

Page 3: Solr and Image Module Extensions of Magnolia

Multi site support• Any number of named configurations.

• Link a site to a specific configuration.

• Admin central solr page updated to trigger deleting all documents of a specific site.

Page 4: Solr and Image Module Extensions of Magnolia

Solr cloud support

Page 5: Solr and Image Module Extensions of Magnolia

Asynchronous indexing

• Indexing is not part of the workflow.

• Creation of solr document and publication done in java.util.concurrent.ExecutorService.

• Faster activation.

• No error when indexing fails.

• Should be configurable.

Page 6: Solr and Image Module Extensions of Magnolia

Improved way to configure which pages are indexed

• Previously done with parameter on template definition.

• Two disadvantages:o No clear overview of which templates are selected

for indexing.o Not possible to configure how pages with a given

template are indexed.

• Added template configuration for templates to Website Document.

• Without this configuration pages are not indexed.

Page 7: Solr and Image Module Extensions of Magnolia

Improved way to configure which pages are indexed

Page 8: Solr and Image Module Extensions of Magnolia

Template based boosting modifier

• Property on template configuration.

• Allows you to favour pages of some type with equal score.

• Defaults to 1.0 (neutral).

Page 9: Solr and Image Module Extensions of Magnolia

Flexible page type resolving mechanism

• We want all documents to have a page type field.

• Based on circumstance page type must be resolved differently:o by path.o by templateo by some external consideration

• Introduced PageTypeResolver interface. Can be set on Template Configuration.

Page 10: Solr and Image Module Extensions of Magnolia

Flexible page type resolving mechanism

Page 11: Solr and Image Module Extensions of Magnolia

Search result: page type to css mapping

• Simple mapping of page types to css names.

• Css class names used when rendering the search result.

Page 12: Solr and Image Module Extensions of Magnolia

Field configuration: Multi value flag to match solr document schema.

• In Solr schema fields can be multi value or not.

• Inserting a document with multiple values for a single value field yields an error.

• The multi value search field configuration property ignores subsequent values for that field.

Page 13: Solr and Image Module Extensions of Magnolia

Field configuration: Pluggable system for converting field data to solr document.

• Standard values not a problem (String, Number, Date, Boolean).

• Need more control for special cases: Images, Html, categories, ..

Page 14: Solr and Image Module Extensions of Magnolia

Field configuration: Pluggable system for converting field data to solr document.

Page 15: Solr and Image Module Extensions of Magnolia

Facets

• Facets: one of the coolest features in Solr.

• Added new configuration for facet fields.

• Maps Solr field names to display field names.

• New paragraph that shows the facets and re-submits the query, narrowing the search.

Page 16: Solr and Image Module Extensions of Magnolia

Facets

Page 17: Solr and Image Module Extensions of Magnolia

Fake facet for period filtering

• Date facets have fixed time intervals.

• Code added for configuring a set of date ranges.

• Configuration option still missing.

Page 18: Solr and Image Module Extensions of Magnolia

Fake facet for period filtering

Page 19: Solr and Image Module Extensions of Magnolia

caveats:Index time boosting• Index Time Boosting is not supported by fields that omit

norms.

• The template boosting modifier creates non-standard values even for fields with no boosting configuration.

• Now you have to set 'omitNorms' to 'true' configuring those fields, so any boosting is disabled for these fields

Page 20: Solr and Image Module Extensions of Magnolia

Todo:Solr Server Configuration• The solr server instances are configured

in the repository.

• This is not nice when you have different servers for test, acceptance, production.

• Somehow externalize at least part of the configuration.

Page 21: Solr and Image Module Extensions of Magnolia

Todo:Query time boosting• Currently all boosting is index time.

• It is hard to tweak the boosting (reïndexing required).

• Query time boosting should become an option.

• Performance?

Page 22: Solr and Image Module Extensions of Magnolia

Todo:Facets and period filter• Make it part of the facet configuration

• Probably move the facet configuration out of the field configuration.

Page 23: Solr and Image Module Extensions of Magnolia

Todo:Indexing on activation• Postponed activation and deactivation not

supported.

• Indexing should be part of the work flow.

• That precludes asynchronous indexing.

Page 24: Solr and Image Module Extensions of Magnolia

Ready to share?

• Create separate module that depends on Magnolia Solr module.

• Remove or generalize some VPRO specific stuff:o Class and package names.o Custom document fields hard coded.o Remove obsolete code/features.

• Documentation

Page 25: Solr and Image Module Extensions of Magnolia