Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # dev - Re: data loss after cluster wide power loss


Copy link to this message
-
Re: data loss after cluster wide power loss
Suresh Srinivas 2013-07-03, 17:19
On Wed, Jul 3, 2013 at 8:12 AM, Colin McCabe <[EMAIL PROTECTED]> wrote:

> On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas <[EMAIL PROTECTED]>
> wrote:
> > Dave,
> >
> > Thanks for the detailed email. Sorry I did not read all the details you
> had
> > sent earlier completely (on my phone). As you said, this is not related
> to
> > data loss related to HBase log and hsync. I think you are right; the
> rename
> > operation itself might not have hit the disk. I think we should either
> > ensure metadata operation is synced on the datanode or handle it being
> > reported as blockBeingWritten. Let me spend sometime to debug this issue.
>
> In theory, ext3 is journaled, so all metadata operations should be
> durable in the case of a power outage.  It is only data operations
> that should be possible to lose.  It is the same for ext4.  (Assuming
> you are not using nonstandard mount options.)
>

ext3 journal may not hit the disk right. From what I read, if you do not
specifically
call sync, even the metadata operations do not hit disk.

See - https://www.kernel.org/doc/Documentation/filesystems/ext3.txt

commit=nrsec (*) Ext3 can be told to sync all its data and metadata
every 'nrsec' seconds. The default value is 5 seconds.
This means that if you lose your power, you will lose
as much as the latest 5 seconds of work (your
filesystem will not be damaged though, thanks to the
journaling).  This default value (or any low value)
will hurt performance, but it's good for data-safety.
Setting it to 0 will have the same effect as leaving
it at the default (5 seconds).
Setting it to very large values will improve

performance.