Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Transforming 1 event to n events

Copy link to this message
Re: Transforming 1 event to n events
Hey Jeremy,

That comment has been in the code now for some time, but I don't think
it is actually enforced anywhere programatically. I think the idea was
just that if you are writing something which is capable of generating
new event data it should be in a source - though I'm also curious to
hear why this was put in there.

IMHO, doing some type of event splitting seems within the scope of how
interceptors are used.

- Patrick

On Fri, Aug 10, 2012 at 11:07 AM, Jeremy Custenborder
> Hello All,
> I'm wondering if you could provide some guidance for me. One of the
> inputs I'm working with batches several entries to a single event.
> This is a lot simpler than my data but it provides an easy example.
> For example:
> timestamp - 5,4,3,2,1
> timestamp - 9,7,5,5,6
> If I tail the file this results in 2 events being generated. This
> example has the data for 10 events.
> Here is high level what I want to accomplish.
> (web server - agent 1)
> exec source tail -f /<some file path>
> collector-client to (agent 2)
> (collector - agent 2)
> collector-server
> Custom Interceptor (input 1 event, output n events)
> Multiplex to
> hdfs
> hbase
> An interceptor looked like the most logical spot for me to add this.
> Is there a better place to add this functionality? Has anyone run into
> a similar case?
> Looking at the docs for Interceptor. intercept(List<Event> events) it
> says "Output list of events. The size of output list MUST NOT BE
> GREATER than the size of the input list (i.e. transformation and
> removal ONLY)." which tells me not to emit more events than given.
> intercept(Event event) only returns a single event so I can't use it
> there either. Why is there a requirement to only return 1 for 1?
> For now I'm implementing a custom source that will handle generating
> multiple events from the events coming in on the web server. My
> preference was to do this transformation on the collector agent before
> I hand off to hdfs and hbase. I know another alternative would be to
> implement custom RPC but I would prefer not to do that. I would prefer
> to rely on what is currently available.
> Thanks!
> j