Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume, mail # dev - Fwd: HDFSsink failover error


Copy link to this message
-
Fwd: HDFSsink failover error
Connor Woodson 2013-01-15, 00:28
Forwarding from the user list. The bug here is that the HDFSEventSink will
not work in a FailoverSinkProcessor. What is going on is that when there is
an IOException, the HDFSEventSink will return Status.BACKOFF; however, in
the failover processor, a sink is only failed if it throws an exception. So
maybe the next process call to the HDFSEventSink will write correctly;
however, if it can't, it will never roll over. The solution I proposed
(throwing an exception) isn't exactly the most elegant, but I can't think
of a better way to go about it.

- Connor

---------- Forwarded message ----------
From: Connor Woodson <[EMAIL PROTECTED]>
Date: Mon, Jan 14, 2013 at 4:25 PM
Subject: Re: HDFSsink failover error
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, Rahul Ravindran <
[EMAIL PROTECTED]>
Oh alright, found it. What is happening is that the HDFS sink does not
throw an exception for this write error, but instead returns a
Status.BACKOFF, and as such the failover processor doesn't think this sink
failed.

(What is strange is that the processor deals with the backoff message for
failed sinks, but not active sinks).

So until that's fixed there isn't a clean way to fix this. The best
solution I can offer is to get the source code for Flume (either the latest
one or the 1.3.1 tag), and make the following change:

In the process method of the HDFSEventSink, in the catch-statements, change:

LOG.warn("HDFS IO error", eIO);
return Status.BACKOFF;
to:

LOG.warn("HDFS IO error", eIO);
throw eIO;
(Line 457 in 1.3.1 or line 454 in trunk)

What this will end up doing is if you ever use the sink outside of a
failover processor, when there is a write error then the sink will throw an
exception and it will probably stop - so, this change will only make it
able to work within the failover sink processor. Optionally, you could make
a copy of the HDFSEventSink (call it FailoverHDFSEventSink if you want) and
put that change in it, so that you can have both versions of the sink.

(if you want instructions on compiling Flume after this change, look for
the thread 'custome serializer')

Unfortunate issue, but it will be fixed in 1.4 I'm sure.

- Connor

On Mon, Jan 14, 2013 at 4:00 PM, Rahul Ravindran <[EMAIL PROTECTED]> wrote:

> Here is the entire log file after I restart flume
>
>   ------------------------------
> *From:* Connor Woodson <[EMAIL PROTECTED]>
> *To:* "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Rahul Ravindran <
> [EMAIL PROTECTED]>
> *Sent:* Monday, January 14, 2013 3:51 PM
>
> *Subject:* Re: HDFSsink failover error
>
> Can you look at the full log file and post the above section as well as
> 5-10 lines above/below it (you don't have to post that stack trace if you
> don't want)? Because that error, while it should definitely be logged,
> should be followed by some error lines giving context as to what is going
> on. And if that is the end of the log file then...well, that just shouldn't
> happen, as there are several different places that would have produced log
> messages as that exception propagates
>
> - Connor
>
> On Mon, Jan 14, 2013 at 3:13 PM, Rahul Ravindran <[EMAIL PROTECTED]>wrote:
>
>  The writes to the backup were successful when I attempted to write to it
> directly but not via the failover sink processor. I did not see the warning
> that you mentioned about "Sink hdfs-sink1failed".
>
> The full log trace is below:
>
> 14 Jan 2013 22:48:24,727 INFO  [hdfs-hdfs-sink2-call-runner-1]
> (org.apache.flume.sink.hdfs.BucketWriter.doOpen:208)  - Creating
> hdfs://ip-10-4-71-187.ec2.internal/user/br/shim/eventstream/event/host102//event.1358203448551.tmp
> 14 Jan 2013 22:48:24,739 WARN
>  [SinkRunner-PollingRunner-FailoverSinkProcessor]
> (org.apache.flume.sink.hdfs.HDFSEventSink.process:456)  - HDFS IO error
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
> Operation category WRITE is not supported in state standby
>         at
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)