Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # dev >> Issue of FSImage, need help

mac fang 2011-06-28, 08:44
Copy link to this message
Re: Issue of FSImage, need help
*Root cause*: Wrong FSImage format when user killed hdfs process. It may
read invalid block
number, may be 1 billion or more, OutOfMemoryError happens before

How can we provide the validity of FSImage file?

Denny Ye

On Tue, Jun 28, 2011 at 4:44 PM, mac fang <[EMAIL PROTECTED]> wrote:

> Hi, Team,
> What we found when we use the Hadoop is, the FSImage often currupts when we
> do start/stop the Hadoop cluster. The reason we think might be around the
> write to the outputstream: the NameNode may be killed when it
> saveNamespace,
> then the FsImage file doesn't complete writing. Currently i saw a
> previous.checkpoint folder, the logic of saveNamespace is like:
> 1. mv the current folder to the previous.checkpoint folder.
> 2. start to write the FSImage into the current folder.
> I think there mightbe a case if the FSImage is currupted, the NameNode can
> NOT be started, but we can NOT get any EOFException, since we might
> encounter the OutofMemory exception if we read the wrong numBlocks and
> instantiate the Blocks [] blocks = new Blocks[numBlocks] (actually, we face
> this issue).
> Any suggestion to it?
> thanks
> macf
Todd Lipcon 2011-06-28, 15:03
mac fang 2011-06-29, 01:11
mac fang 2011-07-04, 05:07