Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # dev >> Transforming 1 event to n events


Copy link to this message
-
Re: Transforming 1 event to n events
Hey Jeremy,

That comment has been in the code now for some time, but I don't think
it is actually enforced anywhere programatically. I think the idea was
just that if you are writing something which is capable of generating
new event data it should be in a source - though I'm also curious to
hear why this was put in there.

IMHO, doing some type of event splitting seems within the scope of how
interceptors are used.

- Patrick

On Fri, Aug 10, 2012 at 11:07 AM, Jeremy Custenborder
<[EMAIL PROTECTED]> wrote:
> Hello All,
>
> I'm wondering if you could provide some guidance for me. One of the
> inputs I'm working with batches several entries to a single event.
> This is a lot simpler than my data but it provides an easy example.
> For example:
>
> timestamp - 5,4,3,2,1
> timestamp - 9,7,5,5,6
>
> If I tail the file this results in 2 events being generated. This
> example has the data for 10 events.
>
> Here is high level what I want to accomplish.
> (web server - agent 1)
> exec source tail -f /<some file path>
> collector-client to (agent 2)
>
> (collector - agent 2)
> collector-server
> Custom Interceptor (input 1 event, output n events)
> Multiplex to
> hdfs
> hbase
>
> An interceptor looked like the most logical spot for me to add this.
> Is there a better place to add this functionality? Has anyone run into
> a similar case?
>
> Looking at the docs for Interceptor. intercept(List<Event> events) it
> says "Output list of events. The size of output list MUST NOT BE
> GREATER than the size of the input list (i.e. transformation and
> removal ONLY)." which tells me not to emit more events than given.
> intercept(Event event) only returns a single event so I can't use it
> there either. Why is there a requirement to only return 1 for 1?
>
> For now I'm implementing a custom source that will handle generating
> multiple events from the events coming in on the web server. My
> preference was to do this transformation on the collector agent before
> I hand off to hdfs and hbase. I know another alternative would be to
> implement custom RPC but I would prefer not to do that. I would prefer
> to rely on what is currently available.
>
> Thanks!
> j
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB