Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Re: ElasticSearchSink - A couple of feature requests

Copy link to this message
Re: ElasticSearchSink - A couple of feature requests
Hi Edward,

Thank you for reaching out. Among the two features I requested for
ElasticSearch sink in flume I have implemented the smaller one (
https://issues.apache.org/jira/browse/FLUME-2206) which will allow users to
provide TTL values with day / hour / week etc. specifier as in current
version of ElasticSearch and have posted the patch for review here (

The second / bigger one is where users will be able to provide
ElasticSearch index naming with / without rolling specifier as opposed to
the current way where user provides the ElasticSearch indexName e.g. say
"flume" and ElasticSearchSink appends %daytimestamp i.e. 2013-10-28 to
create index "flume-2013-10-28" and keeps on creating indexes on a daily
basis. While the current way of creating indices works great under
circumstances where user wants to roll indices on a daily basis it
constrains the user from creating indices on monthly basis i.e.
"flume-2013-10" or "flume-2013-11" etc. or yearly basis, so on and so
forth. Essentially I was looking for HDFS filePrefix style ElasticSearch
index naming. I haven't yet started working on this patch. Please go ahead
if you want to work on this feature request. I have already created a JIRA
ticket (https://issues.apache.org/jira/browse/FLUME-2207) for this one.

- Dib
On Sun, Oct 27, 2013 at 8:03 AM, Edward Sargisson <[EMAIL PROTECTED]> wrote:

> Hi Dib,
> I seem to spend the most time maintaining the Elasticsearch Sink and,
> sadly, am *way* behind on email.
> If you raise Jira issues for your proposed changes and set up the Review
> board for them then either I or a colleague should be able to take a look.
> Normally, once we're happy, a committer will commit them to the repository.
> I will note that I'm on parental leave until Dec 9 and won't have a chance
> to have a look until then. However, when everything's ready drop me an
> email and I'll see if a colleague has time.
> Cheers,
> Edward
> "On Friday, October 4, 2013 at 12:14 PM, Dibyajyoti Ghosh wrote:
> > Hi all,
> >
> > This is a repost from [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]
> ).
> I was not sure if flume developers got the email thus pardon my repost if
> it feels like I am spamming the mailing list.
> >
> > I have a couple of feature requests for ElasticSearchSink and didn't find
> open JIRA tickets for these requirements.
> >
> > I have already modified ElasticSearchSink locally for the smaller of the
> feature request and the longer one is in progress. I wanted to discuss the
> features first with you first before creating the JIRA tickets so here is a
> brief summary of the improvements I have in mind.
> >
> >
> > DETAILS>>>
> >
> > Flume version:
> >
> > Flume 1.4.0-cdh4.4.0
> > Source code repository:
> https://git-wip-us.apache.org/repos/asf/flume.git
> > Revision: 154d35659212f07edc896b414a43996fb8121773
> > Compiled by jenkins on Tue Sep  3 20:53:28 PDT 2013
> > From source with checksum f95b4a7f48080f876d6482bb88bcc342
> >
> >
> > And ElasticSearch v0.90.1.
> >
> > Improvement request #1 - HDFS file suffix style index suffix in
> ElasticSearchSink:
> >
> > agent.sinks.myESsink.indexName = myIndex
> >
> > ElasticSearchSink uses the provided index name as index prefix and
> appends "YYYY-MM-DD" to generate the actual index in ES which being
> convenient for my testing purposes, doesn't allow creating index monthly /
> yearly or more generally speaking based on some regex provided in flume
> config similar to HDFS fileSuffix .e.g.
> >
> > agent.sinks.myESsink.indexSuffix = "YYYY" will create index as
> myIndex-2013 / myIndex-2014 etc and when not provided will create index
> with just the index name or can default back to 'YYYY-MM-DD'.
> >
> > Improvement request #2 - ElasticSearchSink ttl field modification to
> mimic actual ES:
> >
> > agent.sinks.myESsink.ttl = <some integer value> (current specification)
> >
> > The second one is comparatively trivial but good to have. Current
> ElasticSearch TTL defaults to 5 days and works with integers only again