Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Region server shutting down due to HDFS error


+
Eran Kutner 2012-03-28, 11:28
+
Jean-Daniel Cryans 2012-03-28, 16:38
+
Eran Kutner 2012-03-28, 16:45
+
Ted Yu 2012-03-28, 16:53
+
Eran Kutner 2012-03-28, 20:06
Copy link to this message
-
Re: Region server shutting down due to HDFS error
As promised I'm writing back to update the list.
Seems that after upgrading to cdh3u3 of the hadoop cluster and zookeeper
ensemble (hadoop alone wasn't enough) things are no operating well with no
HDFS errors in the logs. I've also set
hbase.regionserver.logroll.errors.tolerated to 3 just in case. Now that the
log is clean a new exception shows up but I'll open a separate thread about
it.

Thanks everyone.

-eran

On Wed, Mar 28, 2012 at 23:06, Eran Kutner <[EMAIL PROTECTED]> wrote:

> hmmm... I couldn't find it either, so I've looked at the history of that
> file and sure enough a few check-ins back it had that message.
> I have no idea how something like this could happen. I know I had some
> merge issues when I first got the latest version and built that project but
> I've then reverted all local changes and rebuilt. The only thing I can
> imagine is that the previous compiled class file was not modified and it
> was the one that got included in the JAR, although I don;t really know how
> can it happen.
>
> -eran
>
>
>
> On Wed, Mar 28, 2012 at 18:53, Ted Yu <[EMAIL PROTECTED]> wrote:
>
>> Eran:
>> The error indicated some zookeeper related issue.
>> Do you see KeeperException after the Error log ?
>>
>> I searched 90 codebase but couldn't find the exact log phrase:
>>
>> zhihyu$ find src/main -name '*.java' -exec grep "getting node's version in
>> CLOSI" {} \; -print
>> zhihyu$ find src/main -name '*.java' -exec grep 'Error getting ' {} \;
>> -print
>>
>> Cheers
>>
>> On Wed, Mar 28, 2012 at 9:45 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
>>
>> > I don't see any prior HDFS issues in the 15 minutes before this
>> exception.
>> > The logs on the datanode reported as problematic are clean as well.
>> > However, I now see the log is full of errors like this:
>> > 2012-03-28 00:15:05,358 DEBUG
>> > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
>> Processing
>> > close of gs_users,731481|S
>> > n쒪㝨眳ԫ䂣⫰==,1331226388691.29929cb2200b3541ead85e17b836ade5.
>> > 2012-03-28 00:15:05,359 WARN
>> > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Error
>> > getting node's version in CLOSIN
>> > G state, aborting close of
>> >
>> gs_users,731481|Sn쒪㝨眳ԫ䂣���==,1331226388691.29929cb2200b3541ead85e17b836ade5.
>> >
>> > -eran
>> >
>> >
>> >
>> > On Wed, Mar 28, 2012 at 18:38, Jean-Daniel Cryans <[EMAIL PROTECTED]
>> > >wrote:
>> >
>> > > Any chance we can see what happened before that too? Usually you
>> > > should see a lot more HDFS spam before getting that all the datanodes
>> > > are bad.
>> > >
>> > > J-D
>> > >
>> > > On Wed, Mar 28, 2012 at 4:28 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
>> > > > Hi,
>> > > >
>> > > > We have region server sporadically stopping under load due
>> supposedly
>> > to
>> > > > errors writing to HDFS. Things like:
>> > > >
>> > > > 2012-03-28 00:37:11,210 WARN org.apache.hadoop.hdfs.DFSClient: Error
>> > > while
>> > > > syncing
>> > > > java.io.IOException: All datanodes 10.1.104.10:50010 are bad.
>> > Aborting..
>> > > >
>> > > > It's happening with a different region server and data node every
>> time,
>> > > so
>> > > > it's not a problem with one specific server and there doesn't seem
>> to
>> > be
>> > > > anything really wrong with either of them. I've already increased
>> the
>> > > file
>> > > > descriptor limit, datanode xceivers and data node handler count. Any
>> > idea
>> > > > what can be causing these errors?
>> > > >
>> > > >
>> > > > A more complete log is here: http://pastebin.com/wC90xU2x
>> > > >
>> > > > Thanks.
>> > > >
>> > > > -eran
>> > >
>> >
>>
>
>
+
Ted Yu 2012-04-05, 13:52
+
Eran Kutner 2012-04-05, 14:35
+
Jean-Daniel Cryans 2012-03-28, 16:48
+
Jimmy Xiang 2012-03-28, 14:17
+
Eran Kutner 2012-03-28, 15:09
+
Harsh J 2012-03-28, 15:21
+
Eran Kutner 2012-03-28, 15:25
+
Stack 2012-03-28, 15:20
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB