|
|
-
Re: fsimage.ckpt are not deleted - Exception in doCheckpointHarsh J 2013-03-20, 07:41
I don't think there's a data loss here. However, I think you may have been
affected by https://issues.apache.org/jira/browse/HDFS-4301 due to a large fsimage size perhaps, which you can workaround by increasing the default timeout via property dfs.image.transfer.timeout (default 60000ms, i.e. 1 minute) to > 10 minutes or so in ms value. On Fri, Mar 1, 2013 at 2:06 PM, Elmar Grote <[EMAIL PROTECTED]> wrote: > ** > Hi, > > we are writing our fsimage and edits file on the namenode and secondary > namenode and additional on a nfs share. > > In these folders we found a a lot of fsimage.ckpt_000000000........ > **. files, the oldest is from 9. Aug 2012. > As far a i know these files should be deleted after the secondary > namenodes creates the new fsimage file. > I looked in our log files from the namenode and secondary namenode to see > what happen at that time. > > As example i searched for this file: > 20. Feb 04:02 fsimage.ckpt_**0000000000726216952 > > In the namenode log i found this: > 2013-02-20 04:02:51,404 ERROR org.apache.hadoop.security.**UserGroupInformation: > PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: > Input/output error > 2013-02-20 04:02:51,409 WARN org.mortbay.log: /getimage: > java.io.IOException: GetImage failed. java.io.IOException: Input/output > error > > In the secondary namenode i think this is the relevant part: > 2013-02-20 04:01:16,554 INFO org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode: > Image has not changed. Will not download image. > 2013-02-20 04:01:16,554 INFO org.apache.hadoop.hdfs.server.**namenode.TransferFsImage: > Opening connection to http://s_namenode.domain.** > local:50070/getimage?getedit=**1&startTxId=726172233&endTxId=** > 726216952&storageInfo=-40:**1814856193:1341996094997:CID-** > 064c4e47-387d-454d-aa1e-**27cec1e816e4<http://s_namenode.domain.local:50070/getimage?getedit=1&startTxId=726172233&endTxId=726216952&storageInfo=-40:1814856193:1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4> > 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.**namenode.TransferFsImage: > Downloaded file edits_0000000000726172233-**0000000000726216952 size > 6881797 bytes. > 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.**namenode.Checkpointer: > Checkpointer about to load edits from 1 stream(s). > 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.**namenode.FSImage: > Reading /var/lib/hdfs_namenode/meta/**dfs/namesecondary/current/** > edits_0000000000726172233-**0000000000726216952 expecting start txid > #726172233 > 2013-02-20 04:01:16,987 INFO org.apache.hadoop.hdfs.server.**namenode.FSImage: > Edits file /var/lib/hdfs_namenode/meta/**dfs/namesecondary/current/** > edits_0000000000726172233-**0000000000726216952 of size 6881797 edits # > 44720 loaded in 0 seconds. > 2013-02-20 04:01:18,023 INFO org.apache.hadoop.hdfs.server.**namenode.FSImage: > Saving image file /var/lib/hdfs_namenode/meta/**dfs/namesecondary/current/ > **fsimage.ckpt_**0000000000726216952 using no compression > 2013-02-20 04:01:18,031 INFO org.apache.hadoop.hdfs.server.**namenode.FSImage: > Saving image file /var/lib/hdfs_nfs_share/dfs/** > namesecondary/current/fsimage.**ckpt_0000000000726216952 using no > compression > 2013-02-20 04:01:40,854 INFO org.apache.hadoop.hdfs.server.**namenode.FSImage: > Image file of size 1211973003 saved in 22 seconds. > 2013-02-20 04:01:50,762 INFO org.apache.hadoop.hdfs.server.**namenode.FSImage: > Image file of size 1211973003 saved in 32 seconds. > 2013-02-20 04:01:50,770 INFO org.apache.hadoop.hdfs.server.**namenode.**NNStorageRetentionManager: > Going to retain 2 images with txid >= 726172232 > 2013-02-20 04:01:50,770 INFO org.apache.hadoop.hdfs.server.**namenode.**NNStorageRetentionManager: > Purging old image FSImageFile(file=/var/lib/**hdfs_namenode/meta/dfs/** > namesecondary/current/fsimage_**0000000000726121750, > cpktTxId=0000000000726121750) > 2013-02-20 04:01:51,000 INFO org.apache.hadoop.hdfs.server.**namenode.**NNStorageRetentionManager: Harsh J |