Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Cluster crash


Copy link to this message
-
Re: Cluster crash
Hi St.Ack and J-D,
Thanks for looking into this.

It can definitely be a configuration problem, but I seriously doubt it
is a network or infrastructure problem. It's our own operated
infrastructure (not a cloud)  and we have a lot of other services
running on it without any problem. Note that Hbase is complaining
about multiple data nodes (10.1.104.1, 10.1.104.2, 10.1.104.5), I
attached the logs from just one of them but it's more or less the same
on all. Please see the NN log for the same block below.

We are using Hadoop 0.20.2-CDH3B4
and Hbase Version 0.90.2-SNAPSHOT, rUnknown, Wed Mar 23 06:09:51 EDT
2011 (I built that from the 0.90.2 branch to try to fix a problem with
replication)

This is the HDFS config file:
# cat /etc/hadoop/conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
     <name>dfs.permissions</name>
     <value>false</value>
  </property>
  <property>
     <!-- specify this so that running 'hadoop namenode -format'
formats the right dir -->
     <name>dfs.name.dir</name>
     <value>/hadoop/dfs/name,/hadoop/dfs/name2</value>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>/hadoop/dfs/data/</value>
  </property>
  <property>
    <name>fs.checkpoint.dir</name>
    <value>/hadoop/dfs/namesecondary/</value>
  </property>
  <property>
    <name>dfs.datanode.max.xcievers</name>
    <value>4096</value>
  </property>
</configuration>
This is the NN log for the same block:
2011-04-10 07:29:23,835 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.allocateBlock:
/hbase/.logs/hadoop1-s01.farm-ny.gigya.com,60020,1302185988579/hadoop1-s01.farm-ny.gigya.com%3A60020.1302434963279.
blk_1213779416283711358_54194
2011-04-10 10:12:55,749 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* blk_1213779416283711358_54194 recovery started,
primary=10.1.104.1:50010
2011-04-10 10:12:58,292 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addStoredBlock: Targets updated: block
blk_1213779416283711358_54249 on 10.1.104.1:50010 is added as a target
for block blk_1213779416283711358_54194 with size 162696
2011-04-10 10:12:58,293 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addStoredBlock: Targets updated: block
blk_1213779416283711358_54249 on 10.1.104.5:50010 is added as a target
for block blk_1213779416283711358_54194 with size 162696
2011-04-10 10:12:58,294 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addStoredBlock: Targets updated: block
blk_1213779416283711358_54249 on 10.1.104.2:50010 is added as a target
for block blk_1213779416283711358_54194 with size 162696
2011-04-10 10:12:58,295 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54194,
newgenerationstamp=54249, newlength=162696,
newtargets=[10.1.104.1:50010, 10.1.104.5:50010, 10.1.104.2:50010],
closeFile=true, deleteBlock=false)
2011-04-10 10:12:58,340 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(newblock=blk_1213779416283711358_54249,
file=/hbase/.logs/hadoop1-s01.farm-ny.gigya.com,60020,1302185988579/hadoop1-s01.farm-ny.gigya.com%3A60020.1302434963279,
newgenerationstamp=54249, newlength=162696,
newtargets=[10.1.104.1:50010, 10.1.104.5:50010, 10.1.104.2:50010])
successful
2011-04-10 10:12:58,773 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_1213779416283711358_54249,
newgenerationstamp=54250, newlength=162696,
newtargets=[10.1.104.1:50010, 10.1.104.5:50010, 10.1.104.2:50010],
closeFile=false, deleteBlock=false)
2011-04-10 10:12:58,798 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(blk_1213779416283711358_54250) successful
2011-04-10 10:12:58,833 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.1.104.2:50010
is added to blk_1213779416283711358_54250 size 162696
2011-04-10 10:12:58,834 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.1.104.5:50010
is added to blk_1213779416283711358_54250 size 162696
2011-04-10 10:12:58,834 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.1.104.1:50010
is added to blk_1213779416283711358_54250 size 162696
2011-04-10 10:13:28,277 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addToInvalidates: blk_1213779416283711358 is added
to invalidSet of 10.1.104.2:50010
2011-04-10 10:13:28,277 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addToInvalidates: blk_1213779416283711358 is added
to invalidSet of 10.1.104.5:50010
2011-04-10 10:13:28,277 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.addToInvalidates: blk_1213779416283711358 is added
to invalidSet of 10.1.104.1:50010
2011-04-10 10:13:29,811 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* ask 10.1.104.2:50010 to delete  blk_-826484977486799080_54240
blk_-3859382357539098123_54247 blk_6239319017569949674_54242
blk_2498532690928849052_54243 blk_-4133675075382826161_54234
blk_-6299622736350943739_54246 blk_1213779416283711358_54250
blk_-7407505773873780456_54244 blk_1070089292787245089_54237
2011-04-10 10:13:32,813 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* ask 10.1.104.1:50010 to delete  blk_-826484977486799080_54240
blk_-9184330013088281910_54231 blk_842002398362359267_54239
blk_6239319017569949674_54242 blk_-4133675075382826161_54234
blk_-2420408100786396293_54245 blk_1213779416283711358_54250
blk_1475680055217098335_54241 blk_-8623426430834046353_54236
blk_-3859382357539098123_54247 blk_2498532690928849052_54243
blk_-6299622736350943739_54246 blk_-7407505773873780456_54244
blk_1070089292787245089_54237
2011-04-10 10:13:32,813 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* ask 10.1.104.5:50010 to d