Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Transforming 1 event to n events

Copy link to this message
Re: Transforming 1 event to n events
to clarify - I mean I think it's within the scope of the design
intentions. I agree that it is currently disallowed (at least in

On Fri, Aug 10, 2012 at 5:14 PM, Patrick Wendell <[EMAIL PROTECTED]> wrote:
> Hey Jeremy,
> That comment has been in the code now for some time, but I don't think
> it is actually enforced anywhere programatically. I think the idea was
> just that if you are writing something which is capable of generating
> new event data it should be in a source - though I'm also curious to
> hear why this was put in there.
> IMHO, doing some type of event splitting seems within the scope of how
> interceptors are used.
> - Patrick
> On Fri, Aug 10, 2012 at 11:07 AM, Jeremy Custenborder
> <[EMAIL PROTECTED]> wrote:
>> Hello All,
>> I'm wondering if you could provide some guidance for me. One of the
>> inputs I'm working with batches several entries to a single event.
>> This is a lot simpler than my data but it provides an easy example.
>> For example:
>> timestamp - 5,4,3,2,1
>> timestamp - 9,7,5,5,6
>> If I tail the file this results in 2 events being generated. This
>> example has the data for 10 events.
>> Here is high level what I want to accomplish.
>> (web server - agent 1)
>> exec source tail -f /<some file path>
>> collector-client to (agent 2)
>> (collector - agent 2)
>> collector-server
>> Custom Interceptor (input 1 event, output n events)
>> Multiplex to
>> hdfs
>> hbase
>> An interceptor looked like the most logical spot for me to add this.
>> Is there a better place to add this functionality? Has anyone run into
>> a similar case?
>> Looking at the docs for Interceptor. intercept(List<Event> events) it
>> says "Output list of events. The size of output list MUST NOT BE
>> GREATER than the size of the input list (i.e. transformation and
>> removal ONLY)." which tells me not to emit more events than given.
>> intercept(Event event) only returns a single event so I can't use it
>> there either. Why is there a requirement to only return 1 for 1?
>> For now I'm implementing a custom source that will handle generating
>> multiple events from the events coming in on the web server. My
>> preference was to do this transformation on the collector agent before
>> I hand off to hdfs and hbase. I know another alternative would be to
>> implement custom RPC but I would prefer not to do that. I would prefer
>> to rely on what is currently available.
>> Thanks!
>> j