Sunday, September 08, 2013

Playing with distributed percolator from elasticsearch

I have recently attended an elasticsearch meetup in Amsterdam, where I heard the first time about a very cool feature of elasticsearch, called percolator.
Using the percolator, you can register queries against one or multiple indices and then you can send percolate requests with a document which return which registered queries it matched.
The percolator has been available in the product since the version 0.15, however starting from version 1.0 a fully distributed version of percolator will be available. The redesigned percolator will have a _percolator type mapping instead of a _percolator index, and the query and data will coexist with the same index. The redesigned percolator is also fixing the confusing part of the query registration API where the index name was represented as the type in the _percolator index. In the redesigned percolator this confusion is gone, because the _percolator became a type

There are many use cases for percolator. For example in case or a real estate website, registered users save their preferences (a query) about their dream house which they want to buy. When a new house is added to the website, a notification is sent to the registered users if there is match with their preferences (using percolate requests).
Let's consider the following model

Since automatic mapping of "geo enabled" properties has been disabled, you have to provide the correct mapping for geo properties. For the sake of simplicity XContentBuilder, a built-in utility from elasticsearch, is used to construct the JSON representation of the model. Another alternative would be to use Jackson.

Let's consider that one user is interested in houses within 5km of Utrecht central station prices between 250000 and 300000. We could have the following filter

Using the above filter we can register the following query. The user id is saved in this query, in order to know to which user it belongs.

When a new house is added then we can a send percolate request with the new document. Note that the new document is not added to the index.

If there was a match we need to check to which user the query belongs in order to send notification.

A working example you find on my github account. In order to try it, you need to clone elasticsearch and build it locally, since the redesigned percolator will be available in the 1.0 release.

2 comments:

Maarten Roosendaal said...

Hi,

Great example, i'm still a bit confused with regards to what you have to build.

I thought that if you register a query and a new document is indexed, ES itself would percolate and send a message to a user (publish-subscribe)

What i read here is that you need to build a tool that continually does percolator requests (maybe as part of the tool that indexes new documents).

What am i missing?

Thanks,
Maarten

Suhaas Valanjoo said...

Maarten, that has changed since ElasticSearch 1.0 onwards.

See Luca Cavanna's comment just below the quersion

If you mean percolate while indexing, it's possible with 0.90 but removed with the new percolator in master (1.0). The reason for the removal is that it is the bigger piece that prevented the percolator from being distributed in 0.90, as you need both the queries and the documents in the same node in order for it to be performant. Makes sense?

Source - http://stackoverflow.com/questions/20960135/is-it-possible-to-get-back-a-response-from-the-percolator-when-inserting-a-docum