Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hbase bkup options


Copy link to this message
-
Re: Hbase bkup options
There are a couple of nits...

1) Compression. This will help a bit when moving the files around.

2) Data size.  You may have bandwidth issues.  Moving TBs of data over a 1GBe network can impact your cluster's performance.  (Even with compression)

Depending on your cluster(s) and infrastructure,  there is going to be a point where the cost of trying to back up to tape is going to exceed the cost of replicating to a second cluster. At the same time, you have to remember that restoring TBs of data will take time.

How large a data set will vary by organization. Again, only you can determine the value of your data.

If you are backing up to a secondary cluster ... you can use the replication feature in HBase. This would be a better fit if you are looking at backing up a large set of HBase tables.
On Jul 23, 2012, at 10:33 AM, Amlan Roy wrote:

> Hi Michael,
>
> Thanks a lot for the reply. What I want to achieve is, if my cluster goes
> down for some reason, I should be able to create a new cluster and should be
> able to import all the backed up data. As I want to store all the tables, I
> expect the data size to be huge (in order of Tera Bytes) and it will keep
> growing.
>
> If I have understood correctly, you have suggested to run "export" to get
> the data into hdfs and then run "hadoop fs -copyToLocal" to get it into
> local file. If I take a back up of the files, is it possible to import that
> data to a new Hbase cluster?
>
> Thanks and regards,
> Amlan
>
> -----Original Message-----
> From: Michael Segel [mailto:[EMAIL PROTECTED]]
> Sent: Monday, July 23, 2012 8:19 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Hbase bkup options
>
> Amian,
>
> Like always the answer to your question is... it depends.
>
> First, how much data are we talking about?
>
> What's the value of the underlying data?
>
> One possible scenario...
> You run a M/R job to copy data from the table to an HDFS file, that is then
> copied to attached storage on an edge node and then to tape.
> Depending on how much data, how much disk is in the attached storage you may
> want to keep a warm copy there, a 'warmer/hot' copy on HDFS and then a cold
> copy on tape off to some offsite storage facility.
>
> There are other options, but it all depends on what you want to achieve.
>
> With respect to the other tools...
>
> You can export  (which is a m/r job) to a local directory, then use distcp
> to a different cluster.  hadoop fs -copyToLocal will let you copy off the
> cluster.
> You could write your own code, but you don't get much gain over existing
> UNIX/Linux tools.
>
>
> On Jul 23, 2012, at 7:52 AM, Amlan Roy wrote:
>
>> Hi,
>>
>>
>>
>> Is it feasible to do disk or tape backup for Hbase tables?
>>
>>
>>
>> I have read about the tools like Export, CopyTable, Distcp. It seems like
>> they will require a separate HDFS cluster to do that.
>>
>>
>>
>> Regards,
>>
>> Amlan
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB