Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # dev >> Transforming 1 event to n events


+
Jeremy Custenborder 2012-08-10, 18:07
+
Patrick Wendell 2012-08-11, 00:14
Copy link to this message
-
Re: Transforming 1 event to n events
to clarify - I mean I think it's within the scope of the design
intentions. I agree that it is currently disallowed (at least in
documentation).

On Fri, Aug 10, 2012 at 5:14 PM, Patrick Wendell <[EMAIL PROTECTED]> wrote:
> Hey Jeremy,
>
> That comment has been in the code now for some time, but I don't think
> it is actually enforced anywhere programatically. I think the idea was
> just that if you are writing something which is capable of generating
> new event data it should be in a source - though I'm also curious to
> hear why this was put in there.
>
> IMHO, doing some type of event splitting seems within the scope of how
> interceptors are used.
>
> - Patrick
>
> On Fri, Aug 10, 2012 at 11:07 AM, Jeremy Custenborder
> <[EMAIL PROTECTED]> wrote:
>> Hello All,
>>
>> I'm wondering if you could provide some guidance for me. One of the
>> inputs I'm working with batches several entries to a single event.
>> This is a lot simpler than my data but it provides an easy example.
>> For example:
>>
>> timestamp - 5,4,3,2,1
>> timestamp - 9,7,5,5,6
>>
>> If I tail the file this results in 2 events being generated. This
>> example has the data for 10 events.
>>
>> Here is high level what I want to accomplish.
>> (web server - agent 1)
>> exec source tail -f /<some file path>
>> collector-client to (agent 2)
>>
>> (collector - agent 2)
>> collector-server
>> Custom Interceptor (input 1 event, output n events)
>> Multiplex to
>> hdfs
>> hbase
>>
>> An interceptor looked like the most logical spot for me to add this.
>> Is there a better place to add this functionality? Has anyone run into
>> a similar case?
>>
>> Looking at the docs for Interceptor. intercept(List<Event> events) it
>> says "Output list of events. The size of output list MUST NOT BE
>> GREATER than the size of the input list (i.e. transformation and
>> removal ONLY)." which tells me not to emit more events than given.
>> intercept(Event event) only returns a single event so I can't use it
>> there either. Why is there a requirement to only return 1 for 1?
>>
>> For now I'm implementing a custom source that will handle generating
>> multiple events from the events coming in on the web server. My
>> preference was to do this transformation on the collector agent before
>> I hand off to hdfs and hbase. I know another alternative would be to
>> implement custom RPC but I would prefer not to do that. I would prefer
>> to rely on what is currently available.
>>
>> Thanks!
>> j
+
Mike Percy 2012-08-11, 01:51
+
Patrick Wendell 2012-08-11, 05:22
+
Mike Percy 2012-08-12, 22:58
+
Jeremy Custenborder 2012-08-13, 16:55
+
Mike Percy 2012-08-13, 18:55
+
Jeremy Custenborder 2012-08-13, 22:34
+
Mike Percy 2012-08-14, 01:59
+
Jeremy Custenborder 2012-08-14, 20:51
+
Mike Percy 2012-08-15, 18:14
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB