Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> corrupted edits log after power failure


Copy link to this message
-
Re: corrupted edits log after power failure
On 22/09/11 20:15, Brian Bockelman wrote:
> Hi Gabi,
>
> I'd be a bit scared of that backup strategy; what happens if the TCP connection gets cut suddenly during curl?  What happens if there's a TCP corruption?  Such things have happened before.

Curl might work for long-haul backups, but I'd use HTTPS for its better
checksums, and have alternate in-cluster strategies, such as shared HA
filesystems

>
> Personally, we have the SNN merge the edits every 15 minutes.  If it hasn't happened in 30 minutes, people get emailed.  If it doesn't happen in 45 minutes, people get paged.

That's a good technique for verifying the SNN is actually working.
Thinking it is working, when it isn't is danger

> In addition to writing out copies to a few disks and to NFS, we also have a versioned backup of the checkpoint.prev.
>
> The worst case scenario would be if the SNN corrupts the image and uploads the corrupt image (it's a theoretical situation so far...); this would be caught at the next merge, meaning we trash up to 30 minutes of work.  This would ruin someone's day, but not someone's week.
>
> The NN is a SPOF, and should be treated with an appropriate level of paranoia (and, because it is a SPOF, assume that it will fail anyway and make sure you can accept the consequences).

That is: test your handling of the outage on a regular basis.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB