Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # dev - ElasticSearchSink: should we combined the proposed interfaces for event serialization and id assignment


+
Edward Sargisson 2013-04-18, 18:57
+
Israel Ekpo 2013-04-18, 20:29
+
Mike Percy 2013-04-18, 23:04
+
Edward Sargisson 2013-04-18, 23:18
+
Mike Percy 2013-04-19, 00:19
Copy link to this message
-
Re: ElasticSearchSink: should we combined the proposed interfaces for event serialization and id assignment
Edward Sargisson 2013-04-19, 16:48
Hi Mike,
Sure - then I think I'll combine the event serialization and id assignment
interfaces for the new interface.
I will probably also fix the elasticsearch optional dependency issue
(FLUME-1992) while I'm there.

Cheers,
Edward
On Thu, Apr 18, 2013 at 5:19 PM, Mike Percy <[EMAIL PROTECTED]> wrote:

> Thanks for the additional info, Edward. Let's avoid breaking API backwards
> compatibility. I've added a comment on the review board, let's see if we
> can just use instanceof to support an additional serializer API.
>
> Mike
>
>
>
> On Thu, Apr 18, 2013 at 4:18 PM, Edward Sargisson <[EMAIL PROTECTED]>wrote:
>
>> Hi Mike,
>> To get more detailed, the problem is that writing the event to
>> elasticsearch so that Kibana can read it requires that it be written to
>> the
>> correct index and have the correct @timestamp header. If the timestamps
>> used for both of those things is not the same then there is a risk that
>> Kibana won't be able to find the event. This is because Kibana searches
>> for
>> events by taking the time interval you specify in the interface,
>> determining which indexes include that time interval and then querying
>> those indexes by the @timestamp field.
>>
>> So, if the sink decides that it needs to provide the timestamp, and it
>> can't put the timestamp into the event itself then the
>> ElasticSearchEventSerializer implementation has no way to get that
>> timestamp. If we relied on the serializer doing System.currentTimeMillis()
>> itself then we run the risk of the event being written at midnight. It
>> might decide to put the event in yesterday's index with today's date.
>>
>> Could we keep the old interface and only use the new one if it's there?
>> The problem with that is that the sink may decide to provide a new
>> timestamp - but the interface implementation has no way of being told what
>> it is.
>>
>> I've just written the code to do it this way and will be adding it to the
>> ticket sometime soon.
>>
>> Cheers,
>> Edward
>>
>>
>> On Thu, Apr 18, 2013 at 4:04 PM, Mike Percy <[EMAIL PROTECTED]> wrote:
>>
>> > Edward,
>> > I added you to CC but I would also recommend subscribing to the dev list
>> > due to how the list headers are configured.
>> >
>> > This is a rough situation. I am loathe to break API compatibility but at
>> > the same time I don't know much about ES and feel I need to find some
>> time
>> > to invest in understanding the ES concepts and how Flume is interacting
>> > with them in this sink.
>> >
>> > Without thinking about it very hard yet, I'd ask if we can just add a
>> new
>> > interface that doesn't suck and maybe is more extensible without
>> breaking
>> > the old one.
>> >
>> > Also wondering if other devs have thoughts on this.
>> >
>> > Mike
>> >
>> >
>> > On Thu, Apr 18, 2013 at 11:57 AM, Edward Sargisson <[EMAIL PROTECTED]
>> >wrote:
>> >
>> >> Hi all,
>> >> (I read this list in digest mode; would you mind ccing me on any
>> replies?)
>> >>
>> >> I've got two patches progressing through Jira (FLUME-1782, FLUME-1972).
>> >> -1782 fixes a defect where the wrong timestamp field and elasticsearch
>> >> index name are used. -1972 adds an interface which users can implement
>> to
>> >> assign an id instead of letting elasticsearch randomly assign on.
>> >>
>> >> The question to discuss: should we(I) combine those interfaces and just
>> >> have a single interface.
>> >>
>> >> Mike Percy has kindly reviewed FLUME-1782 and the knock-on effect of
>> his
>> >> comments require that the ElasticSearchEventSerializer interface be
>> >> changed
>> >> - and thus this becomes a breaking change. I had been attempting to
>> avoid
>> >> that and this is why -1972 has a new interface.
>> >>
>> >> If we're going to break the interface then maybe we should go all the
>> way
>> >> and put the new id provider functionality on to it as well? We could
>> also
>> >> rename it to ElasticsearchEventSerializer (lower case s on search) to
>> >> match
>> >> the way the maintainer of elasticsearch spells it.