Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: Hbase bkup options


+
Minh Duc Nguyen 2012-07-23, 19:05
+
Alok Kumar 2012-07-23, 16:24
+
Amlan Roy 2012-07-23, 12:52
+
Michael Segel 2012-07-23, 14:49
+
Amlan Roy 2012-07-23, 15:33
Copy link to this message
-
Re: Hbase bkup options
There are a couple of nits...

1) Compression. This will help a bit when moving the files around.

2) Data size.  You may have bandwidth issues.  Moving TBs of data over a 1GBe network can impact your cluster's performance.  (Even with compression)

Depending on your cluster(s) and infrastructure,  there is going to be a point where the cost of trying to back up to tape is going to exceed the cost of replicating to a second cluster. At the same time, you have to remember that restoring TBs of data will take time.

How large a data set will vary by organization. Again, only you can determine the value of your data.

If you are backing up to a secondary cluster ... you can use the replication feature in HBase. This would be a better fit if you are looking at backing up a large set of HBase tables.
On Jul 23, 2012, at 10:33 AM, Amlan Roy wrote:

> Hi Michael,
>
> Thanks a lot for the reply. What I want to achieve is, if my cluster goes
> down for some reason, I should be able to create a new cluster and should be
> able to import all the backed up data. As I want to store all the tables, I
> expect the data size to be huge (in order of Tera Bytes) and it will keep
> growing.
>
> If I have understood correctly, you have suggested to run "export" to get
> the data into hdfs and then run "hadoop fs -copyToLocal" to get it into
> local file. If I take a back up of the files, is it possible to import that
> data to a new Hbase cluster?
>
> Thanks and regards,
> Amlan
>
> -----Original Message-----
> From: Michael Segel [mailto:[EMAIL PROTECTED]]
> Sent: Monday, July 23, 2012 8:19 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Hbase bkup options
>
> Amian,
>
> Like always the answer to your question is... it depends.
>
> First, how much data are we talking about?
>
> What's the value of the underlying data?
>
> One possible scenario...
> You run a M/R job to copy data from the table to an HDFS file, that is then
> copied to attached storage on an edge node and then to tape.
> Depending on how much data, how much disk is in the attached storage you may
> want to keep a warm copy there, a 'warmer/hot' copy on HDFS and then a cold
> copy on tape off to some offsite storage facility.
>
> There are other options, but it all depends on what you want to achieve.
>
> With respect to the other tools...
>
> You can export  (which is a m/r job) to a local directory, then use distcp
> to a different cluster.  hadoop fs -copyToLocal will let you copy off the
> cluster.
> You could write your own code, but you don't get much gain over existing
> UNIX/Linux tools.
>
>
> On Jul 23, 2012, at 7:52 AM, Amlan Roy wrote:
>
>> Hi,
>>
>>
>>
>> Is it feasible to do disk or tape backup for Hbase tables?
>>
>>
>>
>> I have read about the tools like Export, CopyTable, Distcp. It seems like
>> they will require a separate HDFS cluster to do that.
>>
>>
>>
>> Regards,
>>
>> Amlan
>>
>
>