Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> Re: data loss after cluster wide power loss


Copy link to this message
-
Re: data loss after cluster wide power loss
Thanks.  Suresh and Kihwal are right-- renames are journalled, but not
necessarily durable (stored to disk).  I was getting mixed up with
HDFS semantics, in which we actually do make the journal durable
before returning success to the client.

It might be a good idea for HDFS to fsync the file descriptor of the
directories involved in the rename operation, before assuming that the
operation is durable.

If you're using ext{2,3,4}, a quick fix would be to use mount -o
dirsync.  I haven't tested it out, but it's supposed to make these
operations synchronous.

>From the man page:
       dirsync
              All directory updates within the filesystem should be done  syn-
              chronously.   This  affects  the  following system calls: creat,
              link, unlink, symlink, mkdir, rmdir, mknod and rename.

Colin
On Wed, Jul 3, 2013 at 10:19 AM, Suresh Srinivas <[EMAIL PROTECTED]> wrote:
> On Wed, Jul 3, 2013 at 8:12 AM, Colin McCabe <[EMAIL PROTECTED]> wrote:
>
>> On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas <[EMAIL PROTECTED]>
>> wrote:
>> > Dave,
>> >
>> > Thanks for the detailed email. Sorry I did not read all the details you
>> had
>> > sent earlier completely (on my phone). As you said, this is not related
>> to
>> > data loss related to HBase log and hsync. I think you are right; the
>> rename
>> > operation itself might not have hit the disk. I think we should either
>> > ensure metadata operation is synced on the datanode or handle it being
>> > reported as blockBeingWritten. Let me spend sometime to debug this issue.
>>
>> In theory, ext3 is journaled, so all metadata operations should be
>> durable in the case of a power outage.  It is only data operations
>> that should be possible to lose.  It is the same for ext4.  (Assuming
>> you are not using nonstandard mount options.)
>>
>
> ext3 journal may not hit the disk right. From what I read, if you do not
> specifically
> call sync, even the metadata operations do not hit disk.
>
> See - https://www.kernel.org/doc/Documentation/filesystems/ext3.txt
>
> commit=nrsec    (*)     Ext3 can be told to sync all its data and metadata
>                         every 'nrsec' seconds. The default value is 5 seconds.
>                         This means that if you lose your power, you will lose
>                         as much as the latest 5 seconds of work (your
>                         filesystem will not be damaged though, thanks to the
>                         journaling).  This default value (or any low value)
>                         will hurt performance, but it's good for data-safety.
>                         Setting it to 0 will have the same effect as leaving
>                         it at the default (5 seconds).
>                         Setting it to very large values will improve
>
> performance.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB