Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> HDFS Sink Question


Copy link to this message
-
Re: HDFS Sink Question
Thanks Devin. I have looked at the source I can absolutely say for certain
that the connection is never re-established because there is no code that
detects that type of error.

What I was looking for from the devs was confirmation on my findings and
any work arounds besides writing my own HDFS Sink.

Not having this recovery gracefully is a pain and may prevent us from using
flume.
On Fri, Oct 4, 2013 at 9:21 AM, DSuiter RDX <[EMAIL PROTECTED]> wrote:

> David,
>
> In experimenting with the file_roll sink for local logging, I noticed that
> the file it wrote to was created when the agent starts. If you start the
> agent, then remove the file, and attempt to write, there is no new file
> created. Perhaps HDFS sink is similar, in that when the sink starts, the
> destination is established, and then if that file chain is broken, Flume
> cannot gracefully detect and correct that. It may have something to do with
> how the sink is looking for the target? I'm not a developer for Flume, but,
> that is my observed behavior on file roll. I am working through kinks in
> hdfs sink with remote TCP logging from rsyslog right now...maybe I will
> have some more insight for you in a few days...
>
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>
>
> On Fri, Oct 4, 2013 at 9:08 AM, David Sinclair <
> [EMAIL PROTECTED]> wrote:
>
>> Anyone?
>>
>> This is what I am seeing for the scenarios I asked, but wanted
>> confirmation from devs on expected behavior.
>>
>>    - HDFS isn't available before ever trying to create/write to a file  -
>>    * continually tries to create the file and finally succeeds when the
>>    cluster is available. *
>>    - HDFS becomes unavailable after already creating a file and starting
>>    to write to it - *the writer looses the connection, but even after
>>    the cluster is available again it never re-establishes a connect. Data loss
>>    occurs since it never recovers*
>>    - HDFS is unavailable when trying to close a file -* suffers from
>>    same problems as above*
>>
>>
>>
>>
>> On Tue, Oct 1, 2013 at 11:04 AM, David Sinclair <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Hi all,
>>>
>>> I have created an AMQP Source that is being used to feed an HDFS Sink.
>>> Everything is working as expected, but I wanted to try out some error
>>> scenarios.
>>>
>>> After creating a file in HDFS and starting to write to it I shutdown
>>> HDFS. I saw the errors in the log as I would expect, and after the
>>> configured roll time tried to close the file. Since HDFS wasn't running it
>>> wasn't able to do so. I restarted HDFS in hope that it would try the close
>>> again but it did not.
>>>
>>> Can someone tell me expected behavior under the following scenarios?
>>>
>>>
>>>    - HDFS isn't available before ever trying to create/write to a file
>>>    - HDFS becomes unavailable after already creating a file and
>>>    starting to write to it
>>>    - HDFS is unavailable when trying to close a file
>>>
>>> I'd also be happy to contribute the AMQP source. I wrote the old version
>>> for the original flume
>>>
>>> https://github.com/stampy88/flume-amqp-plugin/
>>>
>>> Let me know if you'd be interested and thanks for the answers.
>>>
>>> dave
>>>
>>
>>
>