Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> HDFS Sink Question


+
David Sinclair 2013-10-01, 15:04
+
David Sinclair 2013-10-04, 13:08
+
DSuiter RDX 2013-10-04, 13:21
+
David Sinclair 2013-10-04, 14:42
+
DSuiter RDX 2013-10-04, 15:09
Copy link to this message
-
Re: HDFS Sink Question
Can you put these issues you found on a jira here: https://issues.apache.org/jira/browse/FLUME. If this is a real issue, we should fix it. Ideally the sink should reconnect to a broken HDFS, but probably only after the initial connection. I am not sure what happens if the HDFS connection fails.
 

Thanks,
Hari
On Friday, October 4, 2013 at 8:09 AM, DSuiter RDX wrote:

> I can see that being an issue - hopefully your HDFS never hiccups, but if it does, or if you need to stop it, it seems like restarting the agent is the only way to recover...
>
> As a workaround, you may be able to set up a file channel, and then maybe some kind of trigger script to restart them if the HDFS service bounces? Just throwing spaghetti there...
>
> Have you explored Kafka as an alternative? I haven't gone deeply into it, but I know some people have found it to be better for their design than Flume.
>
> Well, hopefully you get the answers you need. If you rewrite the HDFS sink with this built-in, I'm sure the project will be interested!
>
> Devin Suiter
> Jr. Data Solutions Software Engineer
>
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com (http://www.rdx.com/)
>
> On Fri, Oct 4, 2013 at 10:42 AM, David Sinclair <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > Thanks Devin. I have looked at the source I can absolutely say for certain that the connection is never re-established because there is no code that detects that type of error.
> >
> > What I was looking for from the devs was confirmation on my findings and any work arounds besides writing my own HDFS Sink.
> >
> > Not having this recovery gracefully is a pain and may prevent us from using flume.
> >
> >
> > On Fri, Oct 4, 2013 at 9:21 AM, DSuiter RDX <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > > David,
> > >
> > > In experimenting with the file_roll sink for local logging, I noticed that the file it wrote to was created when the agent starts. If you start the agent, then remove the file, and attempt to write, there is no new file created. Perhaps HDFS sink is similar, in that when the sink starts, the destination is established, and then if that file chain is broken, Flume cannot gracefully detect and correct that. It may have something to do with how the sink is looking for the target? I'm not a developer for Flume, but, that is my observed behavior on file roll. I am working through kinks in hdfs sink with remote TCP logging from rsyslog right now...maybe I will have some more insight for you in a few days...
> > >
> > > Devin Suiter
> > > Jr. Data Solutions Software Engineer
> > >
> > > 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> > > Google Voice: 412-256-8556 (tel:412-256-8556) | www.rdx.com (http://www.rdx.com/)
> > >
> > >
> > > On Fri, Oct 4, 2013 at 9:08 AM, David Sinclair <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > > > Anyone?
> > > >
> > > > This is what I am seeing for the scenarios I asked, but wanted confirmation from devs on expected behavior.
> > > > HDFS isn't available before ever trying to create/write to a file  - continually tries to create the file and finally succeeds when the cluster is available.
> > > > HDFS becomes unavailable after already creating a file and starting to write to it - the writer looses the connection, but even after the cluster is available again it never re-establishes a connect. Data loss occurs since it never recovers
> > > > HDFS is unavailable when trying to close a file - suffers from same problems as above
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Oct 1, 2013 at 11:04 AM, David Sinclair <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote:
> > > > > Hi all,
> > > > >
> > > > > I have created an AMQP Source that is being used to feed an HDFS Sink. Everything is working as expected, but I wanted to try out some error scenarios.  
> > > > >