Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Equivalent of Decorators in Flume NG


Copy link to this message
-
Re: Equivalent of Decorators in Flume NG
Harish,
It sounds like a deserialization problem in a custom Source. I would
recommend doing that deserialization in the Source.

If you need to do inspection and tagging for routing purposes, that sounds
like a good fit for either an Interceptor and/or the multiplexing channel
selector.

Does that sound like something that would work for your case?

Regards,
Mike

On Wed, Oct 3, 2012 at 12:53 PM, Harish Mandala <[EMAIL PROTECTED]>wrote:

> Hi Mike,
>
> Sure. Here's my use case:
>
> I receive over an HTTP port large log files containing an array of a
> certain object, serialized as JSON. I need to deserialize each log file
> into its constituent array objects. Each object may be routed to a
> different location in HDFS. Also, I need to place various parts of each of
> theose objects in different locations in HDFS. The solution I thought of
> was to break each event (whose data would be a large JSON log file) into
> many smaller events (which would contain an object or object component),
> put certain headers on them, and route them to the right destination in
> HDFS using a channel selector.
>
> Thanks,
> Harish
>
> On Wed, Oct 3, 2012 at 2:10 PM, Mike Percy <[EMAIL PROTECTED]> wrote:
>
> > Hi Harish,
> > Why do you want to do that? Can you describe your use case?
> >
> > Regards,
> > Mike
> >
> > On Tue, Oct 2, 2012 at 1:28 PM, Harish Mandala <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Hello,
> > >
> > > Alright, so maybe interceptors were not exactly what I wanted.
> > >
> > > It seems the number of events going into an interceptor must equal the
> > > number coming out. However, what if I need to take out the data from a
> > > certain event, and create multiple events from subsets of the data
> which
> > > would then be multiplexed using the selector to different locations.
> > Would
> > > the job of splitting one event into many best be done in a Source or
> > Sink?
> > >
> > > I was contemplating modifying the AvroSource or AvroSink for my
> purposes.
> > > However, it seems the sink also tallies output event counts and input
> > event
> > > counts, and makes sure they're the same. That leaves me the option of
> > > writing a custom source based off the AvroSource. Is my thinking
> correct?
> > >
> > > Thanks,
> > > Harish
> > >
> > > On Mon, Oct 1, 2012 at 6:45 PM, Harish Mandala <
> [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Hi Percy,
> > > >
> > > > Thanks! Interceptors seem good enough.
> > > >
> > > > Regards,
> > > > Harish
> > > >
> > > >
> > > > On Mon, Oct 1, 2012 at 6:32 PM, Mike Percy <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > >> Hi Harish,
> > > >> At this time Flume NG doesn't support unbatching or sink-side
> plugins.
> > > >> Interceptors provide source-side tagging, filtering, and
> > transformation
> > > >> capability, however.
> > > >>
> > > >> Regards,
> > > >> Mike
> > > >>
> > > >>
> > > >> On Mon, Oct 1, 2012 at 3:23 PM, Harish Mandala <
> > [EMAIL PROTECTED]
> > > >> >wrote:
> > > >>
> > > >> > Hello,
> > > >> >
> > > >> > Am I right in thinking Flume NG no longer has the concept of Sink
> > > >> > Decorators? I wanted to do some custom deserialization on incoming
> > > event
> > > >> > data, and split one event into several (De-batching and
> re-routing).
> > > >> What's
> > > >> > the best way to implement this in Flume NG?
> > > >> >
> > > >> > Thanks,
> > > >> > Harish
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB