-Re: Backup Strategies
Keith Turner 2013-05-31, 19:29
On Fri, May 31, 2013 at 2:39 PM, Billie Rinaldi <[EMAIL PROTECTED]>wrote:
> I'm not sure copying data out of HDFS is what you would want to do, though
> I suppose it depends on how much data you're storing there. If you want a
> backup on a different system, but you have too much data to store outside
> of a distributed file system, you could consider using distcp to copy from
> one HDFS instance to another.
> You can't clone the !METADATA table. In 1.5.0, you can export and import
> tables, which is designed to help you copy a table to a different cluster
> (see docs/examples/README.export). Cloning your tables could help, but in
> the case of !METADATA corruption you're still in the position of manually
> creating a new table with the same configuration (and split points if you
> know them) and bulk importing the old data files. I don't know if table
> export could be used to back up the metadata and configuration of a cloned
> table to help you recover its state later on the same system if the
> original table has gotten corrupted. It's possible.
Export table will save the tables state (whats in !METADATA in zookeeper)
to a zipfile. So even if you do not actually copy the exported table, it
can be used to save table metadata. I made comment on ACCUMULO-942 about
using export table to obtain a consistent snapshot of HDFS and Accumulo
metadata using export table. That system metadata could be backed up.
> On Fri, May 31, 2013 at 11:05 AM, Mike Hugo <[EMAIL PROTECTED]> wrote:
>> I'm curious to know how people are backing up data in Accumulo.
>> We are planning on copying data out of HDFS on a some regular basis to be
>> able to do full restore.
>> We've also ended up getting into a state of having a corrupt !METADATA
>> table a few times. I'm wondering if doing a clone on a few tables on a
>> periodic basis (like every hour, for a few hours) might be one way to help
>> us recover from that situation.
>> E.g if we did a clone on all tables, including the !METADATA table
>> hourly, and we didn't necessarily care about losing data in the last hour
>> time frame, could we simply restore from one of those clones if we get into
>> a corrupted state?
>> Is there another mechanism for snapshotting / backing up data in Accumulo?
>> Thanks for your thoughts!