Josh 2013-06-05, 10:52
-RE: Get Flume 'bad' event out of channel.
Paul Chavez 2013-06-05, 16:34
I'm assuming by 'bad' event you mean one that does not have the required headers for tokenized paths. If that's the case there are two potential ways to solve this.
One way is to use multiplexing channel selectors, then you can setup a default path that handles any events missing the header(s). This gets unwieldy fast though if you are routing with multiple headers. I used this method for awhile but eventually abandoned it since I use 3 headers to route events.
The second way is to have a static interceptor on your first source that has 'preserveExisting' set to true (which is default behavior). In my case we use two 'type' fields and I just have an interceptor set the value 'MissingLogType', etc for each possible header. Since I bucket by these header values I can quickly find corrupt events this way. I use a timesampt interceptor in much the same way, except in that case it'll stamp the event with whenever the source first saw it. This can result in an event being bucketed in the wrong date/time partition but that's better than it gumming up the whole data flow.
Hope that helps,
From: Josh [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, June 05, 2013 3:53 AM
To: [EMAIL PROTECTED]
Subject: Get Flume 'bad' event out of channel.
I know this was covered back in May (not so long ago) but was wondering if there has been any movement on this?
We have written a custom serializer to take data from an http data source using the JSON handler. The data source gets sent JSON from our pipeline, which checks that all needed headers are present for serialization and raises exceptions if not, but we have seen a few events come in that cannot be serialized due to missing parts of JSON or any number of other reasons. Currently I can't see a way to get these out of the channel without:
a) chucking out the whole channel and everything in it.
b) attaching a custom sink/serializer to the channel which is not so fussy to pass the event.
Neither of these really seem like great options. We are using file channels and all data that is written to disk looks to be in binary format. If needed, as a last resort, could we write a tool to pull java objects out of a channel and write the rest back into the channel? Are there any plans to implement anything of this kind already?
As previously suggested I would be nice to be able to:
a) Dump the event to a data file and throw a warning in the log messages?
b) Throw the event away
c) Move the event to an alternate channel where it can be handled differently
This email and any attachments is private and confidential. If you have received this message in error please remove it from your systems and notify the author.
MyDrive Solutions Limited is registered in England and Wales, No 07330334.
Registered office: Surrey Technology Centre, 40 Occam Road, Guildford GU2 7YG, UK