Just curios about the performance improvement, can you provide the number
of the JIRA that improves performance in 1.3.1?
On Wed, Aug 14, 2013 at 2:23 PM, Hari Shreedharan <[EMAIL PROTECTED]
> Flume v1.3.0 had a major performance issue which is why 1.3.1 was
> released immediately after. The current stable release is 1.4.0 - so you
> should use that.
> 1. Can you detail this point? Channel to Sink should really not have any
> exceptions - if the sink or a plugin the sink is using is causing
> rollbacks, then that should handle the failure cases/drop events etc. The
> channel is pretty much a passive component just like a queue - "bad events"
> are events sinks cannot handle due to some reason. The logic of handling
> this should be in the sink itself.
> 2. Currently that is not an option, but if you need it, chances are there
> are others who do too. Explain your use-case in a jira. Remember, Flume is
> not a file streaming system, it is an event streaming one, so each file is
> still converted into events by Flume.
> 3. If you think the current deserializers don't fit your use-case, you can
> easily write your own and drop it in.
> On Wednesday, August 14, 2013 at 1:58 PM, Robert Heise wrote:
> As I continue to ramp up using Apache Flume (v1.3.0), I have observed a
> few challenges and hoping somebody who has more experience can shed some
> 1. Establishing a data pipeline is trivial, what I have noticed is that
> any exceptions caught from the channel->sink operation invoke what appears
> to be a repeating cycle of exceptions. As an example, any events which
> cause an exception (java stacktrace) put the agent into a tailspin. There
> are no tools for managing the pipeline to identify culprit events/files,
> stopping, purging the channel, introspecting the pipeline etc. The best
> course of action is to purge everything under file-channel and restart the
> agent. I've read several posts posturing that using regex interceptors
> could be a potential fix, but it is almost impossible to predict, in a
> production environment, what exceptions are going to occur. In my opinion,
> there has to be a declarative manner to move bad events out of the channel
> to a "dead-letter-queue" or equivalent.
> 2. I was hoping that the Spooling Directory Source would help us capture
> file metadata, but nothing ever appears in the default .flumespool
> trackerDir option?
> 3. Maybe my use case is not the right fit for Flume, but my largest design
> constraint is that we deal with files, everything we do is based on files.
> I was hoping that the spooldir and batch control options would provide an
> intuitive way to process files incoming to a spooldirectory, and ultimately
> land that same data to HDFS. However, a file with 470,000 lines is
> creating over 52MM events and because the tooling is week, I have no
> visibility into why that many events are being created, where the agent is
> in respect to completing. The data flow architecture is perfect, but maybe
> Flume is best used for logs, tailing of files, etc, not necessarily
> processing files?
*P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 | [EMAIL PROTECTED]
Pankaj Gupta | Software Engineer
*BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com
United States | Canada | United Kingdom | Germany