Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> "single source - multi channel" scenario  and applying interceptor while writing to only one channel and not on others...possible approaches


Copy link to this message
-
Re: "single source - multi channel" scenario and applying interceptor while writing to only one channel and not on others...possible approaches
Heh, I forgot to link the multiplexing channel selector documentation.

Here it is.<http://flume.apache.org/FlumeUserGuide.html#multiplexing-channel-selector>

- Connor
On Mon, Apr 22, 2013 at 11:52 PM, Connor Woodson <[EMAIL PROTECTED]>wrote:

> Some more thoughts on this:
>
> The way Interceptors are currently set to work is that they apply to an
> event as it is received. There are good uses for this - for instances, it
> allows easily configuring a single Timestamp interceptor that gives all
> events a source receives a timestamp, so even if you have multiple
> sinks/channels responding to an event, you only have that one interceptor.
> Interceptors in this sense serve to add data to event headers, and as such
> it makes sense to have them applied only once by the source instead of
> letting the channels change header data.
>
> If you wish to use an interceptor in the above way, to modify header data,
> and still want that interceptor to apply for a single channel, then if you
> don't mind could you elaborate on what you are trying to do? I haven't been
> able to come up with a situation like that. The solution here would be to
> do as Jeff suggested and use a serializer; if you want more in-depth
> instructions on how to build it, please ask; I have a set of directions
> lying around somewhere that I'll find for you.
>
>
> However, the way Interceptors work I have myself faced a situation where I
> would like the interceptors to be channel only. This use case is when I
> want to use an Interceptor to filter events; I want to send an event to
> some subset of channels based on the contents of its data. Here is how you
> can do this in the current setup (where Interceptors are applied at the
> source instead of per-channel):
>
> Using the Multiplexing Channel Selector you are able to choose which
> channels an event is written to based off of the value of a specified
> header (documentation in that link). There are some more features to the
> selector that aren't documented, called Optional Channels or something, but
> I don't know very much about them - just figured I would point out that
> they exist; digging through the source should provide some more insight.
>
> So here is how you want to set your system up. Create an Interceptor that
> will define a certain header value based off of the event's contents. For
> instance, if you want all events containing exactly 1 character to be sent
> to a channel, you could create an Interceptor that counts the characters in
> the event. Then that Interceptor will set a certain header value to
> "SINGLE" if there is just one character, or "MULTIPLE" if there are more.
>
> Then you can create your channel selector like this (modified from the
> documentation example):
>
> a1.sources = r1
> a1.channels = all_events single_events multiple_events
> a1.sources.r1.interceptors = your_interceptor
> a1.sources.r1.interceptors.your_interceptor.header = header
> a1.sources.r1.selector.type = multiplexing
> a1.sources.r1.selector.header = header
> a1.sources.r1.selector.mapping.SINGLE = all_events single_events
> a1.sources.r1.selector.mapping.MULTIPLE = all_events multiple_events
> a1.sources.r1.selector.default = all_events
>
>
> The result is that now you have created a way to filter which channels a
> certain event is sent to. Note that a channel can appear more than once -
> for instance, all_events will get all events. And so the trick is to just
> define the right interceptor (which are much simpler to code than a
> serializer (which itself is fairly easy)).
>
> Hopefully that was clear. Feel free to ask more questions,
>
> - Connor
>
>
>
> On Fri, Apr 19, 2013 at 11:14 AM, Jeff Lord <[EMAIL PROTECTED]> wrote:
>
>> Jagadish,
>>
>> Here is an example of how to write a custom serializer.
>>
>>
>> https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MyCustomSerializer.java
>>
>> -Jeff
>>
>>
>> On Fri, Apr 19, 2013 at 9:34 AM, Jeff Lord <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB