Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - HDFS Sink Question


Copy link to this message
-
Re: HDFS Sink Question
DSuiter RDX 2013-10-04, 13:21
David,

In experimenting with the file_roll sink for local logging, I noticed that
the file it wrote to was created when the agent starts. If you start the
agent, then remove the file, and attempt to write, there is no new file
created. Perhaps HDFS sink is similar, in that when the sink starts, the
destination is established, and then if that file chain is broken, Flume
cannot gracefully detect and correct that. It may have something to do with
how the sink is looking for the target? I'm not a developer for Flume, but,
that is my observed behavior on file roll. I am working through kinks in
hdfs sink with remote TCP logging from rsyslog right now...maybe I will
have some more insight for you in a few days...

*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com
On Fri, Oct 4, 2013 at 9:08 AM, David Sinclair <
[EMAIL PROTECTED]> wrote:

> Anyone?
>
> This is what I am seeing for the scenarios I asked, but wanted
> confirmation from devs on expected behavior.
>
>    - HDFS isn't available before ever trying to create/write to a file  -*continually tries to create the file and finally succeeds when the cluster
>    is available. *
>    - HDFS becomes unavailable after already creating a file and starting
>    to write to it - *the writer looses the connection, but even after the
>    cluster is available again it never re-establishes a connect. Data loss
>    occurs since it never recovers*
>    - HDFS is unavailable when trying to close a file -* suffers from same
>    problems as above*
>
>
>
>
> On Tue, Oct 1, 2013 at 11:04 AM, David Sinclair <
> [EMAIL PROTECTED]> wrote:
>
>> Hi all,
>>
>> I have created an AMQP Source that is being used to feed an HDFS Sink.
>> Everything is working as expected, but I wanted to try out some error
>> scenarios.
>>
>> After creating a file in HDFS and starting to write to it I shutdown
>> HDFS. I saw the errors in the log as I would expect, and after the
>> configured roll time tried to close the file. Since HDFS wasn't running it
>> wasn't able to do so. I restarted HDFS in hope that it would try the close
>> again but it did not.
>>
>> Can someone tell me expected behavior under the following scenarios?
>>
>>
>>    - HDFS isn't available before ever trying to create/write to a file
>>    - HDFS becomes unavailable after already creating a file and starting
>>    to write to it
>>    - HDFS is unavailable when trying to close a file
>>
>> I'd also be happy to contribute the AMQP source. I wrote the old version
>> for the original flume
>>
>> https://github.com/stampy88/flume-amqp-plugin/
>>
>> Let me know if you'd be interested and thanks for the answers.
>>
>> dave
>>
>
>