Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> "single source - multi channel" scenario  and applying interceptor while writing to only one channel and not on others...possible approaches


+
Jagadish Bihani 2013-04-16, 06:36
+
Jagadish Bihani 2013-04-17, 06:12
+
Jeff Lord 2013-04-19, 16:34
+
Jeff Lord 2013-04-19, 18:14
+
Connor Woodson 2013-04-23, 06:52
+
Connor Woodson 2013-04-23, 06:57
+
Jagadish Bihani 2013-04-23, 09:02
+
Israel Ekpo 2013-04-23, 14:15
Copy link to this message
-
Re: "single source - multi channel" scenario and applying interceptor while writing to only one channel and not on others...possible approaches
Jagadish,

You are right. Your problem here seems to be more about treating your
events differently depending on the sink, and that is what I believe
Serializers are best at. Here are some directions/advice for creating a
serializer (if you look in the lists for the 'custome serializer' thread
you will find another set of directions that may or may not be additionally
useful):

1. I find the place to start is generally with pre-existing code.
BodyTextEventSerializer (this is the default serializer for the HDFS sink /
file sink if none is defined) and HeaderAndBodyTextEventSerializer (at this
link<https://github.com/apache/flume/tree/trunk/flume-ng-core/src/main/java/org/apache/flume/serialization>)
are the two basic serializers, and are the best places to start (they do
almost the exact same thing, so you really only need to look at one of
them); out of all of the files in that link, the only other serializer is
FlumeEventAvroEventSerializer (these names are all mouthfuls...). One thing
of note is that none of these example serializers implement the configure
method - look at the AbstractAvroEventSerializer to see that method
implemented.

2. Things you need to be sure to change when you copy one of those files
are obviously the class name and probably package name, the constructor,
and then the builder class at the bottom; this builder class is what is
used to create and configure the serializer (generally you create the
serializer "EventSerializer s = new MyEventSerializer(out);" and then
configure it "s.configure(context);", or at least that's how I do it - it
appears that the BodyTextEventSerializer configures itself in its
constructor; either way is valid I suppose. The thing of note however is
that this Builder.build(...) is what is called to create an instance of
your serializer.

3. The main method is write(Event e): this method is given an event, and
you are expected to write the contents of that event in some way to the
output stream that the serializer was created with. After write(...) is
called, or maybe a few write's in a row, flush() will be called - I've
never done something with this function.

4. Some details on the other functions: supportsReopen() should return
'true' unless there is a reason for it to return 'false'. I believe this
function is only used in the HDFS Sink writers where it is checked to make
sure a serializer is able to append to an existing stream
(here<https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSCompressedDataStream.java#L98>and
here<https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSDataStream.java#L85>is
the code relating to this). afterCreate is called when a serializer is
created - afterReopen will be called instead if the serializer is appending
(the previous links use this; I don't think afterReopen is called anywhere
else). beforeClose is for just before the stream is closed, and after a
stream is closed the serializer should be removed (/ null'd / set the
serializer variable to null).

5. For my serializers, as I mentioned I implement the configure method
instead of using the constructor in the way the Body one does. I don't do
much of anything in the other functions, other than in 'write' which is
where the meat of the code goes. It appears the way to go about it would be
in your write method, using a RegEx or something else you want to pull
apart your event into its various fields, and then you will write a subset
of those fields to the output stream in one way or another.

I believe that's about it; if anything's unclear, I'll be more than happy
to fix it up.

An important note which I tried to cover above about the builder is that
when you supply your custom FQCN of your serializer for the
"agent.sinks.<sink>.serializer.type" property, you supply the FQCN of the
Builder inner-class, so it looks like this: com.connor.MySerializer$Builder

And here<https://github.com/apache/flume/blob/trunk/flume-ng-doc/sphinx/FlumeUserGuide.rst#installing-third-party-plugins>is
some documentation on the best way to include your custom serializer
in
Flume.

- Connor
On Tue, Apr 23, 2013 at 7:15 AM, Israel Ekpo <[EMAIL PROTECTED]> wrote: