Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Equivalent of Decorators in Flume NG


Copy link to this message
-
Re: Equivalent of Decorators in Flume NG
Harish,
It sounds like a deserialization problem in a custom Source. I would
recommend doing that deserialization in the Source.

If you need to do inspection and tagging for routing purposes, that sounds
like a good fit for either an Interceptor and/or the multiplexing channel
selector.

Does that sound like something that would work for your case?

Regards,
Mike

On Wed, Oct 3, 2012 at 12:53 PM, Harish Mandala <[EMAIL PROTECTED]>wrote:

> Hi Mike,
>
> Sure. Here's my use case:
>
> I receive over an HTTP port large log files containing an array of a
> certain object, serialized as JSON. I need to deserialize each log file
> into its constituent array objects. Each object may be routed to a
> different location in HDFS. Also, I need to place various parts of each of
> theose objects in different locations in HDFS. The solution I thought of
> was to break each event (whose data would be a large JSON log file) into
> many smaller events (which would contain an object or object component),
> put certain headers on them, and route them to the right destination in
> HDFS using a channel selector.
>
> Thanks,
> Harish
>
> On Wed, Oct 3, 2012 at 2:10 PM, Mike Percy <[EMAIL PROTECTED]> wrote:
>
> > Hi Harish,
> > Why do you want to do that? Can you describe your use case?
> >
> > Regards,
> > Mike
> >
> > On Tue, Oct 2, 2012 at 1:28 PM, Harish Mandala <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Hello,
> > >
> > > Alright, so maybe interceptors were not exactly what I wanted.
> > >
> > > It seems the number of events going into an interceptor must equal the
> > > number coming out. However, what if I need to take out the data from a
> > > certain event, and create multiple events from subsets of the data
> which
> > > would then be multiplexed using the selector to different locations.
> > Would
> > > the job of splitting one event into many best be done in a Source or
> > Sink?
> > >
> > > I was contemplating modifying the AvroSource or AvroSink for my
> purposes.
> > > However, it seems the sink also tallies output event counts and input
> > event
> > > counts, and makes sure they're the same. That leaves me the option of
> > > writing a custom source based off the AvroSource. Is my thinking
> correct?
> > >
> > > Thanks,
> > > Harish
> > >
> > > On Mon, Oct 1, 2012 at 6:45 PM, Harish Mandala <
> [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Hi Percy,
> > > >
> > > > Thanks! Interceptors seem good enough.
> > > >
> > > > Regards,
> > > > Harish
> > > >
> > > >
> > > > On Mon, Oct 1, 2012 at 6:32 PM, Mike Percy <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > >> Hi Harish,
> > > >> At this time Flume NG doesn't support unbatching or sink-side
> plugins.
> > > >> Interceptors provide source-side tagging, filtering, and
> > transformation
> > > >> capability, however.
> > > >>
> > > >> Regards,
> > > >> Mike
> > > >>
> > > >>
> > > >> On Mon, Oct 1, 2012 at 3:23 PM, Harish Mandala <
> > [EMAIL PROTECTED]
> > > >> >wrote:
> > > >>
> > > >> > Hello,
> > > >> >
> > > >> > Am I right in thinking Flume NG no longer has the concept of Sink
> > > >> > Decorators? I wanted to do some custom deserialization on incoming
> > > event
> > > >> > data, and split one event into several (De-batching and
> re-routing).
> > > >> What's
> > > >> > the best way to implement this in Flume NG?
> > > >> >
> > > >> > Thanks,
> > > >> > Harish
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>