Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> "single source - multi channel" scenario  and applying interceptor while writing to only one channel and not on others...possible approaches

Jagadish Bihani 2013-04-16, 06:36
Jagadish Bihani 2013-04-17, 06:12
Jeff Lord 2013-04-19, 16:34
Jeff Lord 2013-04-19, 18:14
Connor Woodson 2013-04-23, 06:52
Copy link to this message
Re: "single source - multi channel" scenario and applying interceptor while writing to only one channel and not on others...possible approaches
Heh, I forgot to link the multiplexing channel selector documentation.

Here it is.<http://flume.apache.org/FlumeUserGuide.html#multiplexing-channel-selector>

- Connor
On Mon, Apr 22, 2013 at 11:52 PM, Connor Woodson <[EMAIL PROTECTED]>wrote:

> Some more thoughts on this:
> The way Interceptors are currently set to work is that they apply to an
> event as it is received. There are good uses for this - for instances, it
> allows easily configuring a single Timestamp interceptor that gives all
> events a source receives a timestamp, so even if you have multiple
> sinks/channels responding to an event, you only have that one interceptor.
> Interceptors in this sense serve to add data to event headers, and as such
> it makes sense to have them applied only once by the source instead of
> letting the channels change header data.
> If you wish to use an interceptor in the above way, to modify header data,
> and still want that interceptor to apply for a single channel, then if you
> don't mind could you elaborate on what you are trying to do? I haven't been
> able to come up with a situation like that. The solution here would be to
> do as Jeff suggested and use a serializer; if you want more in-depth
> instructions on how to build it, please ask; I have a set of directions
> lying around somewhere that I'll find for you.
> However, the way Interceptors work I have myself faced a situation where I
> would like the interceptors to be channel only. This use case is when I
> want to use an Interceptor to filter events; I want to send an event to
> some subset of channels based on the contents of its data. Here is how you
> can do this in the current setup (where Interceptors are applied at the
> source instead of per-channel):
> Using the Multiplexing Channel Selector you are able to choose which
> channels an event is written to based off of the value of a specified
> header (documentation in that link). There are some more features to the
> selector that aren't documented, called Optional Channels or something, but
> I don't know very much about them - just figured I would point out that
> they exist; digging through the source should provide some more insight.
> So here is how you want to set your system up. Create an Interceptor that
> will define a certain header value based off of the event's contents. For
> instance, if you want all events containing exactly 1 character to be sent
> to a channel, you could create an Interceptor that counts the characters in
> the event. Then that Interceptor will set a certain header value to
> "SINGLE" if there is just one character, or "MULTIPLE" if there are more.
> Then you can create your channel selector like this (modified from the
> documentation example):
> a1.sources = r1
> a1.channels = all_events single_events multiple_events
> a1.sources.r1.interceptors = your_interceptor
> a1.sources.r1.interceptors.your_interceptor.header = header
> a1.sources.r1.selector.type = multiplexing
> a1.sources.r1.selector.header = header
> a1.sources.r1.selector.mapping.SINGLE = all_events single_events
> a1.sources.r1.selector.mapping.MULTIPLE = all_events multiple_events
> a1.sources.r1.selector.default = all_events
> The result is that now you have created a way to filter which channels a
> certain event is sent to. Note that a channel can appear more than once -
> for instance, all_events will get all events. And so the trick is to just
> define the right interceptor (which are much simpler to code than a
> serializer (which itself is fairly easy)).
> Hopefully that was clear. Feel free to ask more questions,
> - Connor
> On Fri, Apr 19, 2013 at 11:14 AM, Jeff Lord <[EMAIL PROTECTED]> wrote:
>> Jagadish,
>> Here is an example of how to write a custom serializer.
>> https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MyCustomSerializer.java
>> -Jeff
>> On Fri, Apr 19, 2013 at 9:34 AM, Jeff Lord <[EMAIL PROTECTED]> wrote:
Jagadish Bihani 2013-04-23, 09:02
Israel Ekpo 2013-04-23, 14:15
Connor Woodson 2013-04-26, 05:15