Writing Your Own Ingest Processor for Elasticsearch


With Elasticsearch 5.0, a new feature called the ingest node was introduced. To quote the official documentation:

You can use ingest node to pre-process documents before the actual indexing takes place. This pre-processing happens by an ingest node that intercepts bulk and index requests, applies the transformations, and then passes the documents back to the index or bulk APIs.

So, by defining a pipeline you can configure the way a document should be transformed before it is being indexed. There are a fair share of processors already shipped with Elasticsearch, but it is also very easy to roll your own.

This blog post will show how to write an ingest processor that extracts URLs from a field and stores them in an array. This array could be used to pre-fetch this data, spider it, or to simply display the URLs connected with a text field in your application.

Read more at DZone