Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: fsimage.ckpt are not deleted - Exception in doCheckpoint


Copy link to this message
-
Re: fsimage.ckpt are not deleted - Exception in doCheckpoint
I don't think there's a data loss here. However, I think you may have been
affected by https://issues.apache.org/jira/browse/HDFS-4301 due to a large
fsimage size perhaps, which you can workaround by increasing the default
timeout via property dfs.image.transfer.timeout (default 60000ms, i.e. 1
minute) to > 10 minutes or so in ms value.

On Fri, Mar 1, 2013 at 2:06 PM, Elmar Grote <[EMAIL PROTECTED]> wrote:

> **
> Hi,
>
> we are writing our fsimage and edits file on the namenode and secondary
> namenode and additional on a nfs share.
>
> In these folders we found a a lot of fsimage.ckpt_000000000........
> **. files, the oldest is from 9. Aug 2012.
> As far a i know these files should be deleted after the secondary
> namenodes creates the new fsimage file.
> I looked in our log files from the namenode and secondary namenode to see
> what happen at that time.
>
> As example i searched for this file:
> 20. Feb 04:02 fsimage.ckpt_**0000000000726216952
>
> In the namenode log i found this:
> 2013-02-20 04:02:51,404 ERROR org.apache.hadoop.security.**UserGroupInformation:
> PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException:
> Input/output error
> 2013-02-20 04:02:51,409 WARN org.mortbay.log: /getimage:
> java.io.IOException: GetImage failed. java.io.IOException: Input/output
> error
>
> In the secondary namenode i think this is the relevant part:
> 2013-02-20 04:01:16,554 INFO org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode:
> Image has not changed. Will not download image.
> 2013-02-20 04:01:16,554 INFO org.apache.hadoop.hdfs.server.**namenode.TransferFsImage:
> Opening connection to http://s_namenode.domain.**
> local:50070/getimage?getedit=**1&startTxId=726172233&endTxId=**
> 726216952&storageInfo=-40:**1814856193:1341996094997:CID-**
> 064c4e47-387d-454d-aa1e-**27cec1e816e4<http://s_namenode.domain.local:50070/getimage?getedit=1&startTxId=726172233&endTxId=726216952&storageInfo=-40:1814856193:1341996094997:CID-064c4e47-387d-454d-aa1e-27cec1e816e4>
> 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.**namenode.TransferFsImage:
> Downloaded file edits_0000000000726172233-**0000000000726216952 size
> 6881797 bytes.
> 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.**namenode.Checkpointer:
> Checkpointer about to load edits from 1 stream(s).
> 2013-02-20 04:01:16,750 INFO org.apache.hadoop.hdfs.server.**namenode.FSImage:
> Reading /var/lib/hdfs_namenode/meta/**dfs/namesecondary/current/**
> edits_0000000000726172233-**0000000000726216952 expecting start txid
> #726172233
> 2013-02-20 04:01:16,987 INFO org.apache.hadoop.hdfs.server.**namenode.FSImage:
> Edits file /var/lib/hdfs_namenode/meta/**dfs/namesecondary/current/**
> edits_0000000000726172233-**0000000000726216952 of size 6881797 edits #
> 44720 loaded in 0 seconds.
> 2013-02-20 04:01:18,023 INFO org.apache.hadoop.hdfs.server.**namenode.FSImage:
> Saving image file /var/lib/hdfs_namenode/meta/**dfs/namesecondary/current/
> **fsimage.ckpt_**0000000000726216952 using no compression
> 2013-02-20 04:01:18,031 INFO org.apache.hadoop.hdfs.server.**namenode.FSImage:
> Saving image file /var/lib/hdfs_nfs_share/dfs/**
> namesecondary/current/fsimage.**ckpt_0000000000726216952 using no
> compression
> 2013-02-20 04:01:40,854 INFO org.apache.hadoop.hdfs.server.**namenode.FSImage:
> Image file of size 1211973003 saved in 22 seconds.
> 2013-02-20 04:01:50,762 INFO org.apache.hadoop.hdfs.server.**namenode.FSImage:
> Image file of size 1211973003 saved in 32 seconds.
> 2013-02-20 04:01:50,770 INFO org.apache.hadoop.hdfs.server.**namenode.**NNStorageRetentionManager:
> Going to retain 2 images with txid >= 726172232
> 2013-02-20 04:01:50,770 INFO org.apache.hadoop.hdfs.server.**namenode.**NNStorageRetentionManager:
> Purging old image FSImageFile(file=/var/lib/**hdfs_namenode/meta/dfs/**
> namesecondary/current/fsimage_**0000000000726121750,
> cpktTxId=0000000000726121750)
> 2013-02-20 04:01:51,000 INFO org.apache.hadoop.hdfs.server.**namenode.**NNStorageRetentionManager:

Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB