Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - Re: fsimage.ckpt are not deleted - Exception in doCheckpoint


+
Yifan Du 2013-03-08, 10:08
+
Elmar Grote 2013-03-08, 13:43
Copy link to this message
-
Re: fsimage.ckpt are not deleted - Exception in doCheckpoint
Harsh J 2013-03-20, 07:41
I don't think there's a data loss here. However, I think you may have been
affected by https://issues.apache.org/jira/browse/HDFS-4301 due to a large
fsimage size perhaps, which you can workaround by increasing the default
timeout via property dfs.image.transfer.timeout (default 60000ms, i.e. 1
minute) to > 10 minutes or so in ms value.

On Fri, Mar 1, 2013 at 2:06 PM, Elmar Grote <[EMAIL PROTECTED]> wrote:

> **
> Hi,
>
> we are writing our fsimage and edits file on the namenode and secondary
> namenode and additional on a nfs share.
>
> In these folders we found a a lot of fsimage.ckpt_000000000........
> **. files, the oldest is from 9. Aug 2012.
> As far a i know these files should be deleted after the secondary
> namenodes creates the new fsimage file.
> I looked in our log files from the namenode and secondary namenode to see
> what happen at that time.
>
> As example i searched for this file:
> 20. Feb 04:02 fsimage.ckpt_**0000000000726216952
>
> In the namenode log i found this:
> 2013-02-20 04:02:51,404 ERROR org.apache.hadoop.security.**UserGroupInformation:
> PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException:
> Input/output error
> 2013-02-20 04:02:51,409 WARN org.mortbay.log: /getimage:
> java.io.IOException: GetImage failed. java.io.IOException: Input/output
> error
>
> In the secondary namenode i think this is the relevant part:
> 2013-02-20 04:01:16,554 INFO org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode:
> Image has not changed. Will not download image.
> 2013-02-20 04:01:16,554 INFO org.apache.hadoop.hdfs.server.**namenode.TransferFsImage:
> Opening connection to http://s_namenode.domain.**
> local:50070/getimage?getedit=**1&startTxId=726172233&endTxId=**
> 726216952&storageInfo=-40:**1814856193:1341996094997:CID-**
> 064c4e47-387d-454d-aa1e-**27cec1e816e4<http://s_namenode.domain.local:50070/getimage?getedit=1&startTxId=726172233&endTxId=726216952&storageInfo=-40:1814856193:1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4>
> 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.**namenode.TransferFsImage:
> Downloaded file edits_0000000000726172233-**0000000000726216952 size
> 6881797 bytes.
> 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.**namenode.Checkpointer:
> Checkpointer about to load edits from 1 stream(s).
> 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.**namenode.FSImage:
> Reading /var/lib/hdfs_namenode/meta/**dfs/namesecondary/current/**
> edits_0000000000726172233-**0000000000726216952 expecting start txid
> #726172233
> 2013-02-20 04:01:16,987 INFO org.apache.hadoop.hdfs.server.**namenode.FSImage:
> Edits file /var/lib/hdfs_namenode/meta/**dfs/namesecondary/current/**
> edits_0000000000726172233-**0000000000726216952 of size 6881797 edits #
> 44720 loaded in 0 seconds.
> 2013-02-20 04:01:18,023 INFO org.apache.hadoop.hdfs.server.**namenode.FSImage:
> Saving image file /var/lib/hdfs_namenode/meta/**dfs/namesecondary/current/
> **fsimage.ckpt_**0000000000726216952 using no compression
> 2013-02-20 04:01:18,031 INFO org.apache.hadoop.hdfs.server.**namenode.FSImage:
> Saving image file /var/lib/hdfs_nfs_share/dfs/**
> namesecondary/current/fsimage.**ckpt_0000000000726216952 using no
> compression
> 2013-02-20 04:01:40,854 INFO org.apache.hadoop.hdfs.server.**namenode.FSImage:
> Image file of size 1211973003 saved in 22 seconds.
> 2013-02-20 04:01:50,762 INFO org.apache.hadoop.hdfs.server.**namenode.FSImage:
> Image file of size 1211973003 saved in 32 seconds.
> 2013-02-20 04:01:50,770 INFO org.apache.hadoop.hdfs.server.**namenode.**NNStorageRetentionManager:
> Going to retain 2 images with txid >= 726172232
> 2013-02-20 04:01:50,770 INFO org.apache.hadoop.hdfs.server.**namenode.**NNStorageRetentionManager:
> Purging old image FSImageFile(file=/var/lib/**hdfs_namenode/meta/dfs/**
> namesecondary/current/fsimage_**0000000000726121750,
> cpktTxId=0000000000726121750)
> 2013-02-20 04:01:51,000 INFO org.apache.hadoop.hdfs.server.**namenode.**NNStorageRetentionManager:

Harsh J