Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> SpoolDir marks item as completed, when sink fails

Copy link to this message
Re: SpoolDir marks item as completed, when sink fails
Mike, so when the data is committed to the channel, and the channel is of
type "File" then when the agent will be restarted the data will continue to
flow onto the sink?
And if only 20% of the data passed onto the sink before it crashed then a
"Replay" will be done to resend the whole data?

Just trying to grasp the basics....
On Fri, Feb 1, 2013 at 4:56 AM, Mike Percy <[EMAIL PROTECTED]> wrote:

> Tzur, that is expected, because the data is committed by the source onto
> the channel. Sources and sinks are decoupled, they only interact via the
> channel, which buffers the data and serves to mitigate impedance mismatches.
> On Thu, Jan 31, 2013 at 2:35 PM, Tzur Turkenitz <[EMAIL PROTECTED]> wrote:
>> Hello all,
>> I am running HDP 1.2 and Flume 1.3. I have a flume setup which includes a
>> (1) -  Load Balancer that uses SpoolDir adapter and sends events to Avro
>> sinks
>> (2) - Agents which consume the data using an avro source and writing to
>> hdfs.
>> During testing I noticed that there's a dissonance between the Load
>> Balancer and the Consumers...
>> When a Load Balancer process a file it marks it as COMPLETED, even if the
>> consumer has crashed while writing to HDFS.
>> A preferred behavior would be the Load Balancer to wait until the
>> consumer commits its transaction and reports it as successful before the
>> file is marked as COMPLETED. This does not allow me to verify which files
>> has been loaded successfully if an agent has crashed and recovery is in
>> process.
>> Have I miss-configured my Agents or this is actually the desired behavior?
>> Kind Regards,
>> Tzur
Tzur Turkenitz

"*Facts are stubborn things, but statistics are more pliable*"
-Mark Twain