Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> HDFSsink failover error


Copy link to this message
-
Re: HDFSsink failover error
This is https://issues.apache.org/jira/browse/FLUME-1779

On Mon, Jan 14, 2013 at 4:25 PM, Connor Woodson <[EMAIL PROTECTED]> wrote:
> Oh alright, found it. What is happening is that the HDFS sink does not throw
> an exception for this write error, but instead returns a Status.BACKOFF, and
> as such the failover processor doesn't think this sink failed.
>
> (What is strange is that the processor deals with the backoff message for
> failed sinks, but not active sinks).
>
> So until that's fixed there isn't a clean way to fix this. The best solution
> I can offer is to get the source code for Flume (either the latest one or
> the 1.3.1 tag), and make the following change:
>
> In the process method of the HDFSEventSink, in the catch-statements, change:
>
> LOG.warn("HDFS IO error", eIO);
> return Status.BACKOFF;
>
>
> to:
>
> LOG.warn("HDFS IO error", eIO);
> throw eIO;
>
>
> (Line 457 in 1.3.1 or line 454 in trunk)
>
> What this will end up doing is if you ever use the sink outside of a
> failover processor, when there is a write error then the sink will throw an
> exception and it will probably stop - so, this change will only make it able
> to work within the failover sink processor. Optionally, you could make a
> copy of the HDFSEventSink (call it FailoverHDFSEventSink if you want) and
> put that change in it, so that you can have both versions of the sink.
>
> (if you want instructions on compiling Flume after this change, look for the
> thread 'custome serializer')
>
> Unfortunate issue, but it will be fixed in 1.4 I'm sure.
>
> - Connor
>
> On Mon, Jan 14, 2013 at 4:00 PM, Rahul Ravindran <[EMAIL PROTECTED]> wrote:
>>
>> Here is the entire log file after I restart flume
>>
>> ________________________________
>> From: Connor Woodson <[EMAIL PROTECTED]>
>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Rahul Ravindran
>> <[EMAIL PROTECTED]>
>> Sent: Monday, January 14, 2013 3:51 PM
>>
>> Subject: Re: HDFSsink failover error
>>
>> Can you look at the full log file and post the above section as well as
>> 5-10 lines above/below it (you don't have to post that stack trace if you
>> don't want)? Because that error, while it should definitely be logged,
>> should be followed by some error lines giving context as to what is going
>> on. And if that is the end of the log file then...well, that just shouldn't
>> happen, as there are several different places that would have produced log
>> messages as that exception propagates
>>
>> - Connor
>>
>> On Mon, Jan 14, 2013 at 3:13 PM, Rahul Ravindran <[EMAIL PROTECTED]>
>> wrote:
>>
>> The writes to the backup were successful when I attempted to write to it
>> directly but not via the failover sink processor. I did not see the warning
>> that you mentioned about "Sink hdfs-sink1failed".
>>
>> The full log trace is below:
>>
>> 14 Jan 2013 22:48:24,727 INFO  [hdfs-hdfs-sink2-call-runner-1]
>> (org.apache.flume.sink.hdfs.BucketWriter.doOpen:208)  - Creating
>> hdfs://ip-10-4-71-187.ec2.internal/user/br/shim/eventstream/event/host102//event.1358203448551.tmp
>> 14 Jan 2013 22:48:24,739 WARN
>> [SinkRunner-PollingRunner-FailoverSinkProcessor]
>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:456)  - HDFS IO error
>>
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>> Operation category WRITE is not supported in state standby
>>         at
>> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1379)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:762)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1688)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1669)
>>         at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:409)

Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/