what's new in apache solr 4.10

21

Upload: anshum-gupta

Post on 13-Dec-2014

652 views

Category:

Software


3 download

DESCRIPTION

This is from my opening talk at the Downtown SF Apache Lucene/Solr meetup on Sep 17, 2014 at Trulia.

TRANSCRIPT

• Anshum Gupta, Apache Lucene/Solr committer, Lucidworks Employee.

• Search and related stuff for 9+ years.

• Apache Lucene since 2006 and Solr since 2010 but consistent community involvement since 2012

• Organizations I am or have been a part of:

Who am I ?

Solr - Commits and Contributors

via https://www.openhub.net/p/solr

Solr - Releases

Solr 4.10: What’s really new?

• Start scripts in Solr

• bin/solr start -e cloud!

• Schemaless - REST APIs to manage schema

• Auto-generate a unique key in schema-less example

• Remove the restriction of adding json by only wrapping it in an array in a new path ‘/update/json/docs’

Ease of Use

• ~3 Years

• 20+ contributors

• 18 Retweets! :)

• http://searchhub.org/2014/09/02/pivot-facets/

Distributed Pivot Faceting

Distributed facets put to use…

Lucidworks Fusion

• Auto addition of Replicas when using shared file system (HDFS).

• New spatial BBoxField

• Exporting full sorted result sets!

And more…

Diving a little deeper…

• Unloading/Deletion of cores that failed to initialize.

• Update request handlers are registered implicitly, no need to define them.

• Terms Query parser for efficiently filtering documents by a list of values.

• Json loader now flattens nested json to multiple documents.

• Correctly decode special characters in managed stopwords and synonym endpoints.

• Facet counts are no longer duplicated in response if the request duplicates them.

Solr Core

• The CLUSTERSTATUS API tracks and returns much more than the previous version e.g. roles, live nodes etc.

• MIGRATE Collections API

• Now works with legacyCloud=false mode

• Retrying gets better with handling of pre-existing temp collection.

• DELETEREPLICA now removes instance and data directory by default.

• distrib.singlePass parameter to make EXECUTE_QUERY phase fetch all fields and skip GET_FIELDS.

• Also, other bug fixes and slightly better logging!

SolrCloud - APIs

• No more losing the Overseer with the OverseerRoles enabled.

• Distributed commit and optimize are no longer serially executed across all replicas.

• Improvements in leader initiated recovery.

• A ZooKeeper session expiry during setup can keep LeaderElector from joining elections.

• Schemaless concurrency improvements

SolrCloud - Internals

• DistributedQueue is more efficient at creating zk watches.

• Correctly decode special characters in managed stopwords and synonym endpoints.

• OCP doesn’t exit on ZK connection loss and other Zk communication retries.

• Bug Fixes in composite id router.

SolrCloud - Internals

• Improvement in transaction log replay performance on HDFS

• HdfsDirectoryFactory uses supplied Configuration for communicating with secure kerberos.

• HdfsUpdateLog has a race condition that can expose a closed HDFS FileSystem instance.

SolrCloud - HDFS

• SolrJ is better. Support for interval faceting.

• Performance improvement in C*SS - No more spin lock.

• DIH now has onError event handler hook.

• Data Import cancel button in Admin UI

• Improvements to MailEntityProcessor

SolrJ, DIH and more…

• Solr's schema now uses DelegatingAnalyzerWrapper that uses less heap for cached TokenStreamComponents because it caches per FieldType not per Field.

• Reduce CPU usage by avoiding repeated costly calls to Document.getField inside DocumentBuilder.toDocument for use-cases with large number of fields and copyFields.

• BinaryResponseWriter fetches unnecessary stored fields when only pseudo-fields are requested.

Optimizations

• CoreContainer.preRegisterInZk() and CoreContainer.register() commands are merged into CoreContainer.create().

• CoreContainer.remove() has now been replaced with CoreContainer.unload().

• Opened up "public" access to DataSource, DocBuilder, and EntityProcessorWrapper in DIH.

• Added support for multiple spellcheck collations, multi-valued field highlighting to /browse UI.

• Improved SolrCloud cloud-dev scripts.

• Hardened tests so you can rely on this stuff even more!

Solr Developer? Other changes

• Solr 4.10.1

• Should be out anytime!

• LUCENE-5934: 4.10 broke backwards compatibility for 4.0 beta & 4.0-release indexes

• Trunk moves to Java8 after a recent vote

• Ease of use

• Performance + benchmarking

• Stability

• Analytics

What’s next?

http://www.twitter.com/anshumgupta

http://www.linkedin.com/in/anshumgupta/

[email protected]

Connect @