Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Cluster crash


Copy link to this message
-
Re: Cluster crash
There wasn't an attachment, I pasted all the lines from all the NN logs that
contain that particular block number inline.

As for CPU/IO, first there is nothing else running on those servers, second,
CPU utilization on the slaves at peak load was around 40% and disk IO
utilization less than 20%. That's the strange thing about it (I have another
thread going about the performance), there is no bottleneck I could identify
and yet the performance was relatively low, compared to the numbers I see
quoted for HBase in other places.

The first line of the NN log says:
BLOCK* NameSystem.allocateBlock: /hbase/.logs/hadoop1-s01.farm-ny.gigya.com
,60020,1302185988579/hadoop1-s01.farm-ny.gigya.com
%3A60020.1302434963279.blk_1213779416283711358_54194
So it looks like a file name is: /hbase/.logs/hadoop1-s01.farm-ny.gigya.com
,60020,1302185988579/hadoop1-s01.farm-ny.gigya.com%3A60020.1302434963279

Is there a better way to associate a file with a block?

-eran

On Mon, Apr 11, 2011 at 21:00, Stack <[EMAIL PROTECTED]> wrote:

> On Sun, Apr 10, 2011 at 11:30 PM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> > Hi St.Ack and J-D,
> > Thanks for looking into this.
> >
> > It can definitely be a configuration problem, but I seriously doubt it
> > is a network or infrastructure problem. It's our own operated
> > infrastructure (not a cloud)  and we have a lot of other services
> > running on it without any problem.
>
> Services that could be stealing i/o and cpu from hbase cluster?  Is
> that possible?
>
> > Note that Hbase is complaining
> > about multiple data nodes (10.1.104.1, 10.1.104.2, 10.1.104.5), I
> > attached the logs from just one of them but it's more or less the same
> > on all. Please see the NN log for the same block below.
> >
>
> An attachment? Did it come through?  Perhaps pastebin it and then add link
> here?
>
>
> > We are using Hadoop 0.20.2-CDH3B4
> > and Hbase Version 0.90.2-SNAPSHOT, rUnknown, Wed Mar 23 06:09:51 EDT
> > 2011 (I built that from the 0.90.2 branch to try to fix a problem with
> > replication)
> >
>
>
>
>
> > </configuration>
> >
>
> Your configuration looks fine.
>
> Can you associate the block with a file?  I don't see the association
> in the below.  I see us trying to delete the block (would like to know
> why -- file deleteion?) and then it does exist on .2 for whatever
> reason.
>
> St.Ack
>
> >
> > This is the NN log for the same block:
> > 2011-04-10 07:29:23,835 INFO org.apache.hadoop.hdfs.StateChange:
> > BLOCK* NameSystem.allocateBlock:
> > /hbase/.logs/hadoop1-s01.farm-ny.gigya.com,60020,1302185988579/
> hadoop1-s01.farm-ny.gigya.com%3A60020.1302434963279.
> > blk_1213779416283711358_54194
> > 2011-04-10 10:12:55,749 INFO org.apache.hadoop.hdfs.StateChange:
> > BLOCK* blk_1213779416283711358_54194 recovery started,
> > primary=10.1.104.1:50010
> > 2011-04-10 10:12:58,292 INFO org.apache.hadoop.hdfs.StateChange:
> > BLOCK* NameSystem.addStoredBlock: Targets updated: block
> > blk_1213779416283711358_54249 on 10.1.104.1:50010 is added as a target
> > for block blk_1213779416283711358_54194 with size 162696
> > 2011-04-10 10:12:58,293 INFO org.apache.hadoop.hdfs.StateChange:
> > BLOCK* NameSystem.addStoredBlock: Targets updated: block
> > blk_1213779416283711358_54249 on 10.1.104.5:50010 is added as a target
> > for block blk_1213779416283711358_54194 with size 162696
> > 2011-04-10 10:12:58,294 INFO org.apache.hadoop.hdfs.StateChange:
> > BLOCK* NameSystem.addStoredBlock: Targets updated: block
> > blk_1213779416283711358_54249 on 10.1.104.2:50010 is added as a target
> > for block blk_1213779416283711358_54194 with size 162696
> > 2011-04-10 10:12:58,295 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> > commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
> > newgenerationstamp=54249, newlength=162696,
> > newtargets=[10.1.104.1:50010, 10.1.104.5:50010, 10.1.104.2:50010],
> > closeFile=true, deleteBlock=false)
> > 2011-04-10 10:12:58,340 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB