Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Cluster crash


Copy link to this message
-
Re: Cluster crash
Eran Kutner 2011-04-11, 18:22
There wasn't an attachment, I pasted all the lines from all the NN logs that
contain that particular block number inline.

As for CPU/IO, first there is nothing else running on those servers, second,
CPU utilization on the slaves at peak load was around 40% and disk IO
utilization less than 20%. That's the strange thing about it (I have another
thread going about the performance), there is no bottleneck I could identify
and yet the performance was relatively low, compared to the numbers I see
quoted for HBase in other places.

The first line of the NN log says:
BLOCK* NameSystem.allocateBlock: /hbase/.logs/hadoop1-s01.farm-ny.gigya.com
,60020,1302185988579/hadoop1-s01.farm-ny.gigya.com
%3A60020.1302434963279.blk_1213779416283711358_54194
So it looks like a file name is: /hbase/.logs/hadoop1-s01.farm-ny.gigya.com
,60020,1302185988579/hadoop1-s01.farm-ny.gigya.com%3A60020.1302434963279

Is there a better way to associate a file with a block?

-eran

On Mon, Apr 11, 2011 at 21:00, Stack <[EMAIL PROTECTED]> wrote:

> On Sun, Apr 10, 2011 at 11:30 PM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> > Hi St.Ack and J-D,
> > Thanks for looking into this.
> >
> > It can definitely be a configuration problem, but I seriously doubt it
> > is a network or infrastructure problem. It's our own operated
> > infrastructure (not a cloud)  and we have a lot of other services
> > running on it without any problem.
>
> Services that could be stealing i/o and cpu from hbase cluster?  Is
> that possible?
>
> > Note that Hbase is complaining
> > about multiple data nodes (10.1.104.1, 10.1.104.2, 10.1.104.5), I
> > attached the logs from just one of them but it's more or less the same
> > on all. Please see the NN log for the same block below.
> >
>
> An attachment? Did it come through?  Perhaps pastebin it and then add link
> here?
>
>
> > We are using Hadoop 0.20.2-CDH3B4
> > and Hbase Version 0.90.2-SNAPSHOT, rUnknown, Wed Mar 23 06:09:51 EDT
> > 2011 (I built that from the 0.90.2 branch to try to fix a problem with
> > replication)
> >
>
>
>
>
> > </configuration>
> >
>
> Your configuration looks fine.
>
> Can you associate the block with a file?  I don't see the association
> in the below.  I see us trying to delete the block (would like to know
> why -- file deleteion?) and then it does exist on .2 for whatever
> reason.
>
> St.Ack
>
> >
> > This is the NN log for the same block:
> > 2011-04-10 07:29:23,835 INFO org.apache.hadoop.hdfs.StateChange:
> > BLOCK* NameSystem.allocateBlock:
> > /hbase/.logs/hadoop1-s01.farm-ny.gigya.com,60020,1302185988579/
> hadoop1-s01.farm-ny.gigya.com%3A60020.1302434963279.
> > blk_1213779416283711358_54194
> > 2011-04-10 10:12:55,749 INFO org.apache.hadoop.hdfs.StateChange:
> > BLOCK* blk_1213779416283711358_54194 recovery started,
> > primary=10.1.104.1:50010
> > 2011-04-10 10:12:58,292 INFO org.apache.hadoop.hdfs.StateChange:
> > BLOCK* NameSystem.addStoredBlock: Targets updated: block
> > blk_1213779416283711358_54249 on 10.1.104.1:50010 is added as a target
> > for block blk_1213779416283711358_54194 with size 162696
> > 2011-04-10 10:12:58,293 INFO org.apache.hadoop.hdfs.StateChange:
> > BLOCK* NameSystem.addStoredBlock: Targets updated: block
> > blk_1213779416283711358_54249 on 10.1.104.5:50010 is added as a target
> > for block blk_1213779416283711358_54194 with size 162696
> > 2011-04-10 10:12:58,294 INFO org.apache.hadoop.hdfs.StateChange:
> > BLOCK* NameSystem.addStoredBlock: Targets updated: block
> > blk_1213779416283711358_54249 on 10.1.104.2:50010 is added as a target
> > for block blk_1213779416283711358_54194 with size 162696
> > 2011-04-10 10:12:58,295 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> > newgenerationstamp=54249, newlength=162696,
> > newtargets=[10.1.104.1:50010, 10.1.104.5:50010, 10.1.104.2:50010],
> > closeFile=true, deleteBlock=false)
> > 2011-04-10 10:12:58,340 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: