Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # dev - ElasticSearchSink: should we combined the proposed interfaces for event serialization and id assignment


+
Edward Sargisson 2013-04-18, 18:57
+
Israel Ekpo 2013-04-18, 20:29
+
Mike Percy 2013-04-18, 23:04
+
Edward Sargisson 2013-04-18, 23:18
Copy link to this message
-
Re: ElasticSearchSink: should we combined the proposed interfaces for event serialization and id assignment
Mike Percy 2013-04-19, 00:19
Thanks for the additional info, Edward. Let's avoid breaking API backwards
compatibility. I've added a comment on the review board, let's see if we
can just use instanceof to support an additional serializer API.

Mike

On Thu, Apr 18, 2013 at 4:18 PM, Edward Sargisson <[EMAIL PROTECTED]> wrote:

> Hi Mike,
> To get more detailed, the problem is that writing the event to
> elasticsearch so that Kibana can read it requires that it be written to the
> correct index and have the correct @timestamp header. If the timestamps
> used for both of those things is not the same then there is a risk that
> Kibana won't be able to find the event. This is because Kibana searches for
> events by taking the time interval you specify in the interface,
> determining which indexes include that time interval and then querying
> those indexes by the @timestamp field.
>
> So, if the sink decides that it needs to provide the timestamp, and it
> can't put the timestamp into the event itself then the
> ElasticSearchEventSerializer implementation has no way to get that
> timestamp. If we relied on the serializer doing System.currentTimeMillis()
> itself then we run the risk of the event being written at midnight. It
> might decide to put the event in yesterday's index with today's date.
>
> Could we keep the old interface and only use the new one if it's there?
> The problem with that is that the sink may decide to provide a new
> timestamp - but the interface implementation has no way of being told what
> it is.
>
> I've just written the code to do it this way and will be adding it to the
> ticket sometime soon.
>
> Cheers,
> Edward
>
>
> On Thu, Apr 18, 2013 at 4:04 PM, Mike Percy <[EMAIL PROTECTED]> wrote:
>
> > Edward,
> > I added you to CC but I would also recommend subscribing to the dev list
> > due to how the list headers are configured.
> >
> > This is a rough situation. I am loathe to break API compatibility but at
> > the same time I don't know much about ES and feel I need to find some
> time
> > to invest in understanding the ES concepts and how Flume is interacting
> > with them in this sink.
> >
> > Without thinking about it very hard yet, I'd ask if we can just add a new
> > interface that doesn't suck and maybe is more extensible without breaking
> > the old one.
> >
> > Also wondering if other devs have thoughts on this.
> >
> > Mike
> >
> >
> > On Thu, Apr 18, 2013 at 11:57 AM, Edward Sargisson <[EMAIL PROTECTED]
> >wrote:
> >
> >> Hi all,
> >> (I read this list in digest mode; would you mind ccing me on any
> replies?)
> >>
> >> I've got two patches progressing through Jira (FLUME-1782, FLUME-1972).
> >> -1782 fixes a defect where the wrong timestamp field and elasticsearch
> >> index name are used. -1972 adds an interface which users can implement
> to
> >> assign an id instead of letting elasticsearch randomly assign on.
> >>
> >> The question to discuss: should we(I) combine those interfaces and just
> >> have a single interface.
> >>
> >> Mike Percy has kindly reviewed FLUME-1782 and the knock-on effect of his
> >> comments require that the ElasticSearchEventSerializer interface be
> >> changed
> >> - and thus this becomes a breaking change. I had been attempting to
> avoid
> >> that and this is why -1972 has a new interface.
> >>
> >> If we're going to break the interface then maybe we should go all the
> way
> >> and put the new id provider functionality on to it as well? We could
> also
> >> rename it to ElasticsearchEventSerializer (lower case s on search) to
> >> match
> >> the way the maintainer of elasticsearch spells it.
> >>
> >> The new interface would be:
> >> public interface ElasticSearchEventSerializer extends Configurable,
> >>     ConfigurableComponent {
> >>
> >>   static final Charset charset = Charset.defaultCharset();
> >>   XContentBuilder getContentBuilder(Event event, Long timestampOverride)
> >> throws IOException;
> >>
> >>   String getId(Event event)
> >> }
> >>
> >> The timestampOverride would only be non-null if there was no timestamp
+
Edward Sargisson 2013-04-19, 16:48