|
|
-
Re: HDFSsink failover errorBrock Noland 2013-01-15, 00:30
Hi,
Yeah I think this is: https://issues.apache.org/jira/browse/FLUME-1779 Brock On Mon, Jan 14, 2013 at 4:28 PM, Connor Woodson <[EMAIL PROTECTED]> wrote: > Forwarding from the user list. The bug here is that the HDFSEventSink will > not work in a FailoverSinkProcessor. What is going on is that when there is > an IOException, the HDFSEventSink will return Status.BACKOFF; however, in > the failover processor, a sink is only failed if it throws an exception. So > maybe the next process call to the HDFSEventSink will write correctly; > however, if it can't, it will never roll over. The solution I proposed > (throwing an exception) isn't exactly the most elegant, but I can't think > of a better way to go about it. > > - Connor > > ---------- Forwarded message ---------- > From: Connor Woodson <[EMAIL PROTECTED]> > Date: Mon, Jan 14, 2013 at 4:25 PM > Subject: Re: HDFSsink failover error > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, Rahul Ravindran < > [EMAIL PROTECTED]> > > > Oh alright, found it. What is happening is that the HDFS sink does not > throw an exception for this write error, but instead returns a > Status.BACKOFF, and as such the failover processor doesn't think this sink > failed. > > (What is strange is that the processor deals with the backoff message for > failed sinks, but not active sinks). > > So until that's fixed there isn't a clean way to fix this. The best > solution I can offer is to get the source code for Flume (either the latest > one or the 1.3.1 tag), and make the following change: > > In the process method of the HDFSEventSink, in the catch-statements, change: > > LOG.warn("HDFS IO error", eIO); > return Status.BACKOFF; > > > to: > > LOG.warn("HDFS IO error", eIO); > throw eIO; > > > (Line 457 in 1.3.1 or line 454 in trunk) > > What this will end up doing is if you ever use the sink outside of a > failover processor, when there is a write error then the sink will throw an > exception and it will probably stop - so, this change will only make it > able to work within the failover sink processor. Optionally, you could make > a copy of the HDFSEventSink (call it FailoverHDFSEventSink if you want) and > put that change in it, so that you can have both versions of the sink. > > (if you want instructions on compiling Flume after this change, look for > the thread 'custome serializer') > > Unfortunate issue, but it will be fixed in 1.4 I'm sure. > > - Connor > > On Mon, Jan 14, 2013 at 4:00 PM, Rahul Ravindran <[EMAIL PROTECTED]> wrote: > >> Here is the entire log file after I restart flume >> >> ------------------------------ >> *From:* Connor Woodson <[EMAIL PROTECTED]> >> *To:* "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Rahul Ravindran < >> [EMAIL PROTECTED]> >> *Sent:* Monday, January 14, 2013 3:51 PM >> >> *Subject:* Re: HDFSsink failover error >> >> Can you look at the full log file and post the above section as well as >> 5-10 lines above/below it (you don't have to post that stack trace if you >> don't want)? Because that error, while it should definitely be logged, >> should be followed by some error lines giving context as to what is going >> on. And if that is the end of the log file then...well, that just shouldn't >> happen, as there are several different places that would have produced log >> messages as that exception propagates >> >> - Connor >> >> On Mon, Jan 14, 2013 at 3:13 PM, Rahul Ravindran <[EMAIL PROTECTED]>wrote: >> >> The writes to the backup were successful when I attempted to write to it >> directly but not via the failover sink processor. I did not see the warning >> that you mentioned about "Sink hdfs-sink1failed". >> >> The full log trace is below: >> >> 14 Jan 2013 22:48:24,727 INFO [hdfs-hdfs-sink2-call-runner-1] >> (org.apache.flume.sink.hdfs.BucketWriter.doOpen:208) - Creating >> hdfs://ip-10-4-71-187.ec2.internal/user/br/shim/eventstream/event/host102//event.1358203448551.tmp >> 14 Jan 2013 22:48:24,739 WARN >> [SinkRunner-PollingRunner-FailoverSinkProcessor] Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ |