Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # dev >> Re: ElasticSearchSink - A couple of feature requests


Copy link to this message
-
Re: ElasticSearchSink - A couple of feature requests
Hi Dib,
I seem to spend the most time maintaining the Elasticsearch Sink and,
sadly, am *way* behind on email.

If you raise Jira issues for your proposed changes and set up the Review
board for them then either I or a colleague should be able to take a look.
Normally, once we're happy, a committer will commit them to the repository.

I will note that I'm on parental leave until Dec 9 and won't have a chance
to have a look until then. However, when everything's ready drop me an
email and I'll see if a colleague has time.

Cheers,
Edward

"On Friday, October 4, 2013 at 12:14 PM, Dibyajyoti Ghosh wrote:

> Hi all,
>
> This is a repost from [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]).
I was not sure if flume developers got the email thus pardon my repost if
it feels like I am spamming the mailing list.
>
> I have a couple of feature requests for ElasticSearchSink and didn't find
open JIRA tickets for these requirements.
>
> I have already modified ElasticSearchSink locally for the smaller of the
feature request and the longer one is in progress. I wanted to discuss the
features first with you first before creating the JIRA tickets so here is a
brief summary of the improvements I have in mind.
>
>
> DETAILS>>>
>
> Flume version:
>
> Flume 1.4.0-cdh4.4.0
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 154d35659212f07edc896b414a43996fb8121773
> Compiled by jenkins on Tue Sep  3 20:53:28 PDT 2013
> From source with checksum f95b4a7f48080f876d6482bb88bcc342
>
>
> And ElasticSearch v0.90.1.
>
> Improvement request #1 - HDFS file suffix style index suffix in
ElasticSearchSink:
>
> agent.sinks.myESsink.indexName = myIndex
>
> ElasticSearchSink uses the provided index name as index prefix and
appends "YYYY-MM-DD" to generate the actual index in ES which being
convenient for my testing purposes, doesn't allow creating index monthly /
yearly or more generally speaking based on some regex provided in flume
config similar to HDFS fileSuffix .e.g.
>
> agent.sinks.myESsink.indexSuffix = "YYYY" will create index as
myIndex-2013 / myIndex-2014 etc and when not provided will create index
with just the index name or can default back to 'YYYY-MM-DD'.
>
> Improvement request #2 - ElasticSearchSink ttl field modification to
mimic actual ES:
>
> agent.sinks.myESsink.ttl = <some integer value> (current specification)
>
> The second one is comparatively trivial but good to have. Current
ElasticSearch TTL defaults to 5 days and works with integers only again
which is treated as days.
>
> It will be good to have a qualifier like "d" / "s" / "m" / "w" / "h" to
mimic the TTL configuration in ElasticSearch mapping.
>
> agent.sinks.myESsink.ttl = "3w" / 3 (requested specification)
>
> For the ttl I have already made changes in my local flume git repo and
currently testing it. The change doesn't break existing way of specifying
TTL field only extends it to allow "1d" / "2w" style TTL specification.
>
> <<<DETAILS
>
> Kindly suggest what should I do to make these changes incorporated in the
future release(s) of Flume.
>
> Best and thanks,
> - Dib
Thanks Hari.

I am creating JIRA tickets for the improvements.

Best,
- Dib"
+
Dibyajyoti Ghosh 2013-10-28, 17:32
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB