Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Transforming 1 event to n events


Copy link to this message
-
Re: Transforming 1 event to n events
Hi Jeremy,

On Mon, Aug 13, 2012 at 9:55 AM, Jeremy Custenborder <
[EMAIL PROTECTED]> wrote:
>
>
> > I believe you are just
> > trying to work around a limitation of the exec source, since it appears
> > you're describing a serialization issue."
>
> > Alternatively, one could use an HBase serializer to generate multiple
> > increment / decrement operations, and just log the original line in HDFS
> > (or use an EventSerializer).
>
> The is what I'm working towards. I want a 1 for 1 entry in hdfs but
> increment counters in hbase
>

HBase serializer can generate multiple operations per Event, and the HDFS
serializer could generate whatever output Hive expects as well.
> Given this I was just planning on emitting an event with the body I
> was going to use in hive early in the pipeline. Send the same data to
> hdfs and hbase. Then use a serializer on the hbase side to increment
> the counters. This would allow me to add data to hdfs in the format
> I'm planning on consuming it with without managing two serializers. My
> plans for the hbase serializer was literally generate key, increment
> per record based on the input. So only a couple lines of code.
>

Yeah, if you are doing much parsing in your serializers it's going to be a
bit more complex.

 > I pondered this a bit over the last day or so and I'm kind of lukewarm on
> > adding preconditions checks at this time. The reason I didn't do it
> > initially is that while I wanted a particular contract for that
> component,
> > in order to make Interceptors viable to maintain and understand with the
> > current design of the Flume core, I wasn't sure if it would be sufficient
> > for all future use cases. So if someone wants to do something that breaks
> > that contract, then they are "on their own", doing stuff that may break
> in
> > future implementations. If they're willing to accept that risk then they
> > have the freedom to maybe do something novel and awesome, which might
> > prompt us to add a different kind of extension mechanism in the future to
> > support whatever that use case is.
>
> I think there should be an approved method for this case. A different
> extension that could perform processing like this could be helpful. To
> me when I looked at an interceptor I thought of using it as a
> replacement for a decorator in the old version of flume. We have a lot
> of code that will take a log entry and replace the body with a
> protocol buffer representation. I prefer to run this code on an
> upstream tier from the web server. Interceptors would work fine for
> the one in one out case.
>

Have you considered using an Interceptor or a custom source to generate a
single event that has a series of timestamps within it? You could use
protobufs for serialization of that data structure.

Since you have multiple timestamps / timings on the same log line, I wonder
if it isn't a single "event" with multiple facets and this isn't just a
semantics thing.

Regards,
Mike
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB