Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Region server shutting down due to HDFS error


Copy link to this message
-
Re: Region server shutting down due to HDFS error
Freudian slip :)

-eran

On Thu, Apr 5, 2012 at 16:52, Ted Yu <[EMAIL PROTECTED]> wrote:

> Thanks for writing back.
>
> I guess you meant 'things are now operating well', below :-)
>
> On Thu, Apr 5, 2012 at 6:25 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
>
> > As promised I'm writing back to update the list.
> > Seems that after upgrading to cdh3u3 of the hadoop cluster and zookeeper
> > ensemble (hadoop alone wasn't enough) things are no operating well with
> no
> > HDFS errors in the logs. I've also set
> > hbase.regionserver.logroll.errors.tolerated to 3 just in case. Now that
> the
> > log is clean a new exception shows up but I'll open a separate thread
> about
> > it.
> >
> > Thanks everyone.
> >
> > -eran
> >
> >
> >
> > On Wed, Mar 28, 2012 at 23:06, Eran Kutner <[EMAIL PROTECTED]> wrote:
> >
> > > hmmm... I couldn't find it either, so I've looked at the history of
> that
> > > file and sure enough a few check-ins back it had that message.
> > > I have no idea how something like this could happen. I know I had some
> > > merge issues when I first got the latest version and built that project
> > but
> > > I've then reverted all local changes and rebuilt. The only thing I can
> > > imagine is that the previous compiled class file was not modified and
> it
> > > was the one that got included in the JAR, although I don;t really know
> > how
> > > can it happen.
> > >
> > > -eran
> > >
> > >
> > >
> > > On Wed, Mar 28, 2012 at 18:53, Ted Yu <[EMAIL PROTECTED]> wrote:
> > >
> > >> Eran:
> > >> The error indicated some zookeeper related issue.
> > >> Do you see KeeperException after the Error log ?
> > >>
> > >> I searched 90 codebase but couldn't find the exact log phrase:
> > >>
> > >> zhihyu$ find src/main -name '*.java' -exec grep "getting node's
> version
> > in
> > >> CLOSI" {} \; -print
> > >> zhihyu$ find src/main -name '*.java' -exec grep 'Error getting ' {} \;
> > >> -print
> > >>
> > >> Cheers
> > >>
> > >> On Wed, Mar 28, 2012 at 9:45 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> > >>
> > >> > I don't see any prior HDFS issues in the 15 minutes before this
> > >> exception.
> > >> > The logs on the datanode reported as problematic are clean as well.
> > >> > However, I now see the log is full of errors like this:
> > >> > 2012-03-28 00:15:05,358 DEBUG
> > >> > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
> > >> Processing
> > >> > close of gs_users,731481|S
> > >> > n쒪㝨眳ԫ䂣⫰==,1331226388691.29929cb2200b3541ead85e17b836ade5.
> > >> > 2012-03-28 00:15:05,359 WARN
> > >> > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
> Error
> > >> > getting node's version in CLOSIN
> > >> > G state, aborting close of
> > >> >
> > >>
> >
> gs_users,731481|Sn쒪㝨眳ԫ䂣���==,1331226388691.29929cb2200b3541ead85e17b836ade5.
> > >> >
> > >> > -eran
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Mar 28, 2012 at 18:38, Jean-Daniel Cryans <
> > [EMAIL PROTECTED]
> > >> > >wrote:
> > >> >
> > >> > > Any chance we can see what happened before that too? Usually you
> > >> > > should see a lot more HDFS spam before getting that all the
> > datanodes
> > >> > > are bad.
> > >> > >
> > >> > > J-D
> > >> > >
> > >> > > On Wed, Mar 28, 2012 at 4:28 AM, Eran Kutner <[EMAIL PROTECTED]>
> > wrote:
> > >> > > > Hi,
> > >> > > >
> > >> > > > We have region server sporadically stopping under load due
> > >> supposedly
> > >> > to
> > >> > > > errors writing to HDFS. Things like:
> > >> > > >
> > >> > > > 2012-03-28 00:37:11,210 WARN org.apache.hadoop.hdfs.DFSClient:
> > Error
> > >> > > while
> > >> > > > syncing
> > >> > > > java.io.IOException: All datanodes 10.1.104.10:50010 are bad.
> > >> > Aborting..
> > >> > > >
> > >> > > > It's happening with a different region server and data node
> every
> > >> time,
> > >> > > so
> > >> > > > it's not a problem with one specific server and there doesn't
> seem
> > >> to
> > >> > be
> > >> > > > anything really wrong with either of them. I've already
> increased
> > >> the
> > >>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB