Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: fsimage.ckpt are not deleted - Exception in doCheckpoint


Copy link to this message
-
Re: fsimage.ckpt are not deleted - Exception in doCheckpoint
Hi Yifan,

thank you for the answer.

But as far as i understand the SN downloads the fsimage and edits files from NN,
build the new fsimage in uploads it to the NN.

So here the upload didn't work. The next time the creation starts there is the old fsimage on the NN.
But what about the edits files ? Are the old ones still there? Or where they deleted
during the not working upload of the fsimage? If they where deleted the are missing and
there should be a loss or inconsistence of data.

Or am i wrong?

When will the edits files be deleted? After a successful upload or before?

Regards Elmar

  _____  

From: Yifan Du [mailto:[EMAIL PROTECTED]]
To: [EMAIL PROTECTED]
Sent: Fri, 08 Mar 2013 11:08:09 +0100
Subject: Re: fsimage.ckpt are not deleted - Exception in doCheckpoint

I have met this exception too.
  The new fsimage played by SNN could not be transfered to NN.
  My hdfs version is 2.0.0.
  did anyone know how to fix it?
  
  @Regards Elmar
  The new fsimage has been created successfully. But it could not be
  transfered to NN,so the old fsimage.ckpt not deleted.
  I have tried the new fsimage. Startup the cluster with the new fsimage
  and new edits in progress. It's successfully and no data lost.
  
  
  2013/3/6, Elmar Grote <[EMAIL PROTECTED]>:
  > Hi,
  >
  > we are writing our fsimage and edits file on the namenode and secondary
  > namenode and additional on a nfs share.
  >
  > In these folders we found a a lot of fsimage.ckpt_000000000........
  > . files, the oldest is from 9. Aug 2012.
  > As far a i know these files should be deleted after the secondary namenodes
  > creates the new fsimage file.
  > I looked in our log files from the namenode and secondary namenode to see
  > what happen at that time.
  >
  > As example i searched for this file:
  > 20. Feb 04:02 fsimage.ckpt_0000000000726216952
  >
  > In the namenode log i found this:
  > 2013-02-20 04:02:51,404 ERROR
  > org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
  > as:hdfs (auth:SIMPLE) cause:java.io.IOException: Input/output error
  > 2013-02-20   04:02:51,409 WARN org.mortbay.log: /getimage:
  > java.io.IOException:   GetImage failed. java.io.IOException: Input/output
  > error
  >
  > In the secondary namenode i think this is the relevant part:
  > 2013-02-20 04:01:16,554 INFO
  > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Image has not
  > changed. Will not download image.
  > 2013-02-20 04:01:16,554 INFO
  > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
  > to
  > http://s_namenode.domain.local:50070/getimage?getedit=1&startTxId=726172233&endTxId=726216952&storageInfo=-40:1814856193:1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4
  > 2013-02-20 04:01:16,750 INFO
  > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file
  > edits_0000000000726172233-0000000000726216952 size 6881797 bytes.
  > 2013-02-20 04:01:16,750 INFO
  > org.apache.hadoop.hdfs.server.namenode.Checkpointer: Checkpointer about to
  > load edits from 1 stream(s).
  > 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
  > Reading
  > /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/edits_0000000000726172233-0000000000726216952
  > expecting start txid #726172233
  > 2013-02-20 04:01:16,987 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
  > Edits file
  > /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/edits_0000000000726172233-0000000000726216952
  > of size 6881797 edits # 44720 loaded in 0 seconds.
  > 2013-02-20 04:01:18,023 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
  > Saving image file
  > /var/lib/hdfs_namenode/meta/dfs/namesecondary/current/fsimage.ckpt_0000000000726216952
  > using no compression
  > 2013-02-20 04:01:18,031 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
  > Saving image file
  > /var/lib/hdfs_nfs_share/dfs/namesecondary/current/fsimage.ckpt_0000000000726216952
  > using no compression
  > 2013-02-20 04:01:40,854 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
  > Image file of size 1211973003 saved in 22 seconds.
  > 2013-02-20 04:01:50,762 INFO org.apache.hadoop.hdfs.server.namenode.FSImage:
  > Image file of size 1211973003 saved in 32 seconds.
  > 2013-02-20 04:01:50,770 INFO
  > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Going to
  > retain 2 images with txid >= 726172232
  > 2013-02-20 04:01:50,770 INFO
  > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Purging
  > old image
  > FSImageFile(file=/var/lib/hdfs_namenode/meta/dfs/namesecondary/current/fsimage_0000000000726121750,
  > cpktTxId=0000000000726121750)
  > 2013-02-20 04:01:51,000 INFO
  > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Purging
  > old image
  > FSImageFile(file=/var/lib/hdfs_nfs_share/dfs/namesecondary/current/fsimage_0000000000726121750,
  > cpktTxId=0000000000726121750)
  > 2013-02-20 04:01:51,379 INFO
  > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging logs
  > older than 725172233
  > 2013-02-20 04:01:51,381 INFO
  > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Purging logs
  > older than 725172233
  > 2013-02-20 04:01:51,400 INFO
  > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
  > to
  > http://s_namenode.domain.local:50070/getimage?putimage=1&txid=726216952&port=50090&storageInfo=-40:1814856193:1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4
  > 2013-02-20 04:02:51,411 ERROR
  > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
  > doCheckpoint
  > org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFailedException:
  > Image transfer servlet at
  > http://s_namenode.domain.local:50070/getimage?putimage=1&txid=726216952&port=50090&storageInfo=-40:1814856193:1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4
  > failed with status code 410
  > Response message:
  > GetImage failed. java.io.IOExcepti
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB