Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Region server shutting down due to HDFS error


Copy link to this message
-
Re: Region server shutting down due to HDFS error
Eran Kutner 2012-03-28, 20:06
hmmm... I couldn't find it either, so I've looked at the history of that
file and sure enough a few check-ins back it had that message.
I have no idea how something like this could happen. I know I had some
merge issues when I first got the latest version and built that project but
I've then reverted all local changes and rebuilt. The only thing I can
imagine is that the previous compiled class file was not modified and it
was the one that got included in the JAR, although I don;t really know how
can it happen.

-eran

On Wed, Mar 28, 2012 at 18:53, Ted Yu <[EMAIL PROTECTED]> wrote:

> Eran:
> The error indicated some zookeeper related issue.
> Do you see KeeperException after the Error log ?
>
> I searched 90 codebase but couldn't find the exact log phrase:
>
> zhihyu$ find src/main -name '*.java' -exec grep "getting node's version in
> CLOSI" {} \; -print
> zhihyu$ find src/main -name '*.java' -exec grep 'Error getting ' {} \;
> -print
>
> Cheers
>
> On Wed, Mar 28, 2012 at 9:45 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
>
> > I don't see any prior HDFS issues in the 15 minutes before this
> exception.
> > The logs on the datanode reported as problematic are clean as well.
> > However, I now see the log is full of errors like this:
> > 2012-03-28 00:15:05,358 DEBUG
> > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
> Processing
> > close of gs_users,731481|S
> > n쒪㝨眳ԫ䂣⫰==,1331226388691.29929cb2200b3541ead85e17b836ade5.
> > 2012-03-28 00:15:05,359 WARN
> > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Error
> > getting node's version in CLOSIN
> > G state, aborting close of
> >
> gs_users,731481|Sn쒪㝨眳ԫ䂣���==,1331226388691.29929cb2200b3541ead85e17b836ade5.
> >
> > -eran
> >
> >
> >
> > On Wed, Mar 28, 2012 at 18:38, Jean-Daniel Cryans <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Any chance we can see what happened before that too? Usually you
> > > should see a lot more HDFS spam before getting that all the datanodes
> > > are bad.
> > >
> > > J-D
> > >
> > > On Wed, Mar 28, 2012 at 4:28 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> > > > Hi,
> > > >
> > > > We have region server sporadically stopping under load due supposedly
> > to
> > > > errors writing to HDFS. Things like:
> > > >
> > > > 2012-03-28 00:37:11,210 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > > while
> > > > syncing
> > > > java.io.IOException: All datanodes 10.1.104.10:50010 are bad.
> > Aborting..
> > > >
> > > > It's happening with a different region server and data node every
> time,
> > > so
> > > > it's not a problem with one specific server and there doesn't seem to
> > be
> > > > anything really wrong with either of them. I've already increased the
> > > file
> > > > descriptor limit, datanode xceivers and data node handler count. Any
> > idea
> > > > what can be causing these errors?
> > > >
> > > >
> > > > A more complete log is here: http://pastebin.com/wC90xU2x
> > > >
> > > > Thanks.
> > > >
> > > > -eran
> > >
> >
>