Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # user - SpoolDir marks item as completed, when sink fails


+
Tzur Turkenitz 2013-01-31, 22:35
+
Mike Percy 2013-02-01, 09:56
+
Tzur Turkenitz 2013-02-01, 15:44
+
Mike Percy 2013-02-05, 08:29
+
Mike Percy 2013-02-05, 08:31
Copy link to this message
-
Re: SpoolDir marks item as completed, when sink fails
Tzur Turkenitz 2013-02-05, 15:25
Thank you Mike, you've been a great help.
I have conducted additional tests and verified event data is not lost, as
you stated in your prior comment.

I appreciate it.

Kind Regards,
Tzur
On Tue, Feb 5, 2013 at 3:31 AM, Mike Percy <[EMAIL PROTECTED]> wrote:

> Hmm in case I didn't answer the whole question:
>
> Yes the file channel is durable and the data will persist across restarts.
>
> Any data written by the sink will be removed from the channel. Since Flume
> is event oriented then the remaining events in the channel will be drained
> when they are taken from the sink at the next opportunity.
>
> Regards
> Mike
>
>
> On Tuesday, February 5, 2013, Mike Percy wrote:
>
>> Tzur,
>> The source and sink are decoupled completely. The source will fill the
>> channel until there is no more work or the channel is full. So the data is
>> sitting buffered in the channel until the sink removes it.
>>
>> Hope that explains things. Let me know if anything is unclear.
>>
>> Regards,
>> Mike
>>
>> On Friday, February 1, 2013, Tzur Turkenitz wrote:
>>
>>> Mike, so when the data is committed to the channel, and the channel is
>>> of type "File" then when the agent will be restarted the data will continue
>>> to flow onto the sink?
>>> And if only 20% of the data passed onto the sink before it crashed then
>>> a "Replay" will be done to resend the whole data?
>>>
>>> Just trying to grasp the basics....
>>>
>>>
>>>
>>>
>>> On Fri, Feb 1, 2013 at 4:56 AM, Mike Percy <[EMAIL PROTECTED]> wrote:
>>>
>>>> Tzur, that is expected, because the data is committed by the source
>>>> onto the channel. Sources and sinks are decoupled, they only interact via
>>>> the channel, which buffers the data and serves to mitigate impedance
>>>> mismatches.
>>>>
>>>>
>>>>
>>>> On Thu, Jan 31, 2013 at 2:35 PM, Tzur Turkenitz <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> I am running HDP 1.2 and Flume 1.3. I have a flume setup which
>>>>> includes a
>>>>> (1) -  Load Balancer that uses SpoolDir adapter and sends events to
>>>>> Avro sinks
>>>>> (2) - Agents which consume the data using an avro source and writing
>>>>> to hdfs.
>>>>>
>>>>> During testing I noticed that there's a dissonance between the Load
>>>>> Balancer and the Consumers...
>>>>> When a Load Balancer process a file it marks it as COMPLETED, even if
>>>>> the consumer has crashed while writing to HDFS.
>>>>>
>>>>> A preferred behavior would be the Load Balancer to wait until the
>>>>> consumer commits its transaction and reports it as successful before the
>>>>> file is marked as COMPLETED. This does not allow me to verify which files
>>>>> has been loaded successfully if an agent has crashed and recovery is in
>>>>> process.
>>>>>
>>>>> Have I miss-configured my Agents or this is actually the desired
>>>>> behavior?
>>>>>
>>>>>
>>>>> Kind Regards,
>>>>> Tzur
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Tzur Turkenitz
>>> Vision.BI
>>> http://www.vision.bi/
>>>
>>> "*Facts are stubborn things, but statistics are more pliable*"
>>> -Mark Twain
>>>
>>
--
Regards,
Tzur Turkenitz
Vision.BI
http://www.vision.bi/

"*Facts are stubborn things, but statistics are more pliable*"
-Mark Twain