Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - "single source - multi channel" scenario  and applying interceptor while writing to only one channel and not on others...possible approaches


Copy link to this message
-
Re: "single source - multi channel" scenario and applying interceptor while writing to only one channel and not on others...possible approaches
Connor Woodson 2013-04-23, 06:52
Some more thoughts on this:

The way Interceptors are currently set to work is that they apply to an
event as it is received. There are good uses for this - for instances, it
allows easily configuring a single Timestamp interceptor that gives all
events a source receives a timestamp, so even if you have multiple
sinks/channels responding to an event, you only have that one interceptor.
Interceptors in this sense serve to add data to event headers, and as such
it makes sense to have them applied only once by the source instead of
letting the channels change header data.

If you wish to use an interceptor in the above way, to modify header data,
and still want that interceptor to apply for a single channel, then if you
don't mind could you elaborate on what you are trying to do? I haven't been
able to come up with a situation like that. The solution here would be to
do as Jeff suggested and use a serializer; if you want more in-depth
instructions on how to build it, please ask; I have a set of directions
lying around somewhere that I'll find for you.
However, the way Interceptors work I have myself faced a situation where I
would like the interceptors to be channel only. This use case is when I
want to use an Interceptor to filter events; I want to send an event to
some subset of channels based on the contents of its data. Here is how you
can do this in the current setup (where Interceptors are applied at the
source instead of per-channel):

Using the Multiplexing Channel Selector you are able to choose which
channels an event is written to based off of the value of a specified
header (documentation in that link). There are some more features to the
selector that aren't documented, called Optional Channels or something, but
I don't know very much about them - just figured I would point out that
they exist; digging through the source should provide some more insight.

So here is how you want to set your system up. Create an Interceptor that
will define a certain header value based off of the event's contents. For
instance, if you want all events containing exactly 1 character to be sent
to a channel, you could create an Interceptor that counts the characters in
the event. Then that Interceptor will set a certain header value to
"SINGLE" if there is just one character, or "MULTIPLE" if there are more.

Then you can create your channel selector like this (modified from the
documentation example):

a1.sources = r1
a1.channels = all_events single_events multiple_events
a1.sources.r1.interceptors = your_interceptor
a1.sources.r1.interceptors.your_interceptor.header = header
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = header
a1.sources.r1.selector.mapping.SINGLE = all_events single_events
a1.sources.r1.selector.mapping.MULTIPLE = all_events multiple_events
a1.sources.r1.selector.default = all_events
The result is that now you have created a way to filter which channels a
certain event is sent to. Note that a channel can appear more than once -
for instance, all_events will get all events. And so the trick is to just
define the right interceptor (which are much simpler to code than a
serializer (which itself is fairly easy)).

Hopefully that was clear. Feel free to ask more questions,

- Connor

On Fri, Apr 19, 2013 at 11:14 AM, Jeff Lord <[EMAIL PROTECTED]> wrote:

> Jagadish,
>
> Here is an example of how to write a custom serializer.
>
>
> https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MyCustomSerializer.java
>
> -Jeff
>
>
> On Fri, Apr 19, 2013 at 9:34 AM, Jeff Lord <[EMAIL PROTECTED]> wrote:
>
>> Hi Jagadish,
>>
>> Have you considered using a custom event serializer to modify your event?
>> Its possible to replicate your flow using two channels and then have one
>> sink that implements a custom serializer to modify the event.
>>
>> -Jeff
>>
>>
>> On Tue, Apr 16, 2013 at 11:12 PM, Jagadish Bihani <
>> [EMAIL PROTECTED]> wrote: