Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # dev - Re: HDFSsink failover error


Copy link to this message
-
Re: HDFSsink failover error
Brock Noland 2013-01-15, 00:30
Hi,

Yeah I think this is:

https://issues.apache.org/jira/browse/FLUME-1779

Brock
On Mon, Jan 14, 2013 at 4:28 PM, Connor Woodson <[EMAIL PROTECTED]> wrote:
> Forwarding from the user list. The bug here is that the HDFSEventSink will
> not work in a FailoverSinkProcessor. What is going on is that when there is
> an IOException, the HDFSEventSink will return Status.BACKOFF; however, in
> the failover processor, a sink is only failed if it throws an exception. So
> maybe the next process call to the HDFSEventSink will write correctly;
> however, if it can't, it will never roll over. The solution I proposed
> (throwing an exception) isn't exactly the most elegant, but I can't think
> of a better way to go about it.
>
> - Connor
>
> ---------- Forwarded message ----------
> From: Connor Woodson <[EMAIL PROTECTED]>
> Date: Mon, Jan 14, 2013 at 4:25 PM
> Subject: Re: HDFSsink failover error
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, Rahul Ravindran <
> [EMAIL PROTECTED]>
>
>
> Oh alright, found it. What is happening is that the HDFS sink does not
> throw an exception for this write error, but instead returns a
> Status.BACKOFF, and as such the failover processor doesn't think this sink
> failed.
>
> (What is strange is that the processor deals with the backoff message for
> failed sinks, but not active sinks).
>
> So until that's fixed there isn't a clean way to fix this. The best
> solution I can offer is to get the source code for Flume (either the latest
> one or the 1.3.1 tag), and make the following change:
>
> In the process method of the HDFSEventSink, in the catch-statements, change:
>
> LOG.warn("HDFS IO error", eIO);
> return Status.BACKOFF;
>
>
> to:
>
> LOG.warn("HDFS IO error", eIO);
> throw eIO;
>
>
> (Line 457 in 1.3.1 or line 454 in trunk)
>
> What this will end up doing is if you ever use the sink outside of a
> failover processor, when there is a write error then the sink will throw an
> exception and it will probably stop - so, this change will only make it
> able to work within the failover sink processor. Optionally, you could make
> a copy of the HDFSEventSink (call it FailoverHDFSEventSink if you want) and
> put that change in it, so that you can have both versions of the sink.
>
> (if you want instructions on compiling Flume after this change, look for
> the thread 'custome serializer')
>
> Unfortunate issue, but it will be fixed in 1.4 I'm sure.
>
> - Connor
>
> On Mon, Jan 14, 2013 at 4:00 PM, Rahul Ravindran <[EMAIL PROTECTED]> wrote:
>
>> Here is the entire log file after I restart flume
>>
>>   ------------------------------
>> *From:* Connor Woodson <[EMAIL PROTECTED]>
>> *To:* "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Rahul Ravindran <
>> [EMAIL PROTECTED]>
>> *Sent:* Monday, January 14, 2013 3:51 PM
>>
>> *Subject:* Re: HDFSsink failover error
>>
>> Can you look at the full log file and post the above section as well as
>> 5-10 lines above/below it (you don't have to post that stack trace if you
>> don't want)? Because that error, while it should definitely be logged,
>> should be followed by some error lines giving context as to what is going
>> on. And if that is the end of the log file then...well, that just shouldn't
>> happen, as there are several different places that would have produced log
>> messages as that exception propagates
>>
>> - Connor
>>
>> On Mon, Jan 14, 2013 at 3:13 PM, Rahul Ravindran <[EMAIL PROTECTED]>wrote:
>>
>>  The writes to the backup were successful when I attempted to write to it
>> directly but not via the failover sink processor. I did not see the warning
>> that you mentioned about "Sink hdfs-sink1failed".
>>
>> The full log trace is below:
>>
>> 14 Jan 2013 22:48:24,727 INFO  [hdfs-hdfs-sink2-call-runner-1]
>> (org.apache.flume.sink.hdfs.BucketWriter.doOpen:208)  - Creating
>> hdfs://ip-10-4-71-187.ec2.internal/user/br/shim/eventstream/event/host102//event.1358203448551.tmp
>> 14 Jan 2013 22:48:24,739 WARN
>>  [SinkRunner-PollingRunner-FailoverSinkProcessor]

Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/