Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> HDFS Sink Question


Copy link to this message
-
Re: HDFS Sink Question
Thanks Devin. I have looked at the source I can absolutely say for certain
that the connection is never re-established because there is no code that
detects that type of error.

What I was looking for from the devs was confirmation on my findings and
any work arounds besides writing my own HDFS Sink.

Not having this recovery gracefully is a pain and may prevent us from using
flume.
On Fri, Oct 4, 2013 at 9:21 AM, DSuiter RDX <[EMAIL PROTECTED]> wrote:

> David,
>
> In experimenting with the file_roll sink for local logging, I noticed that
> the file it wrote to was created when the agent starts. If you start the
> agent, then remove the file, and attempt to write, there is no new file
> created. Perhaps HDFS sink is similar, in that when the sink starts, the
> destination is established, and then if that file chain is broken, Flume
> cannot gracefully detect and correct that. It may have something to do with
> how the sink is looking for the target? I'm not a developer for Flume, but,
> that is my observed behavior on file roll. I am working through kinks in
> hdfs sink with remote TCP logging from rsyslog right now...maybe I will
> have some more insight for you in a few days...
>
> *Devin Suiter*
> Jr. Data Solutions Software Engineer
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com
>
>
> On Fri, Oct 4, 2013 at 9:08 AM, David Sinclair <
> [EMAIL PROTECTED]> wrote:
>
>> Anyone?
>>
>> This is what I am seeing for the scenarios I asked, but wanted
>> confirmation from devs on expected behavior.
>>
>>    - HDFS isn't available before ever trying to create/write to a file  -
>>    * continually tries to create the file and finally succeeds when the
>>    cluster is available. *
>>    - HDFS becomes unavailable after already creating a file and starting
>>    to write to it - *the writer looses the connection, but even after
>>    the cluster is available again it never re-establishes a connect. Data loss
>>    occurs since it never recovers*
>>    - HDFS is unavailable when trying to close a file -* suffers from
>>    same problems as above*
>>
>>
>>
>>
>> On Tue, Oct 1, 2013 at 11:04 AM, David Sinclair <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Hi all,
>>>
>>> I have created an AMQP Source that is being used to feed an HDFS Sink.
>>> Everything is working as expected, but I wanted to try out some error
>>> scenarios.
>>>
>>> After creating a file in HDFS and starting to write to it I shutdown
>>> HDFS. I saw the errors in the log as I would expect, and after the
>>> configured roll time tried to close the file. Since HDFS wasn't running it
>>> wasn't able to do so. I restarted HDFS in hope that it would try the close
>>> again but it did not.
>>>
>>> Can someone tell me expected behavior under the following scenarios?
>>>
>>>
>>>    - HDFS isn't available before ever trying to create/write to a file
>>>    - HDFS becomes unavailable after already creating a file and
>>>    starting to write to it
>>>    - HDFS is unavailable when trying to close a file
>>>
>>> I'd also be happy to contribute the AMQP source. I wrote the old version
>>> for the original flume
>>>
>>> https://github.com/stampy88/flume-amqp-plugin/
>>>
>>> Let me know if you'd be interested and thanks for the answers.
>>>
>>> dave
>>>
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB