Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Backing up HDFS


Copy link to this message
-
Re: Backing up HDFS
On Tue, Aug 3, 2010 at 11:46 AM, Michael Segel
<[EMAIL PROTECTED]> wrote:
>
>
>
>> Date: Tue, 3 Aug 2010 11:02:48 -0400
>> Subject: Re: Backing up HDFS
>> From: [EMAIL PROTECTED]
>> To: [EMAIL PROTECTED]
>>
>
>> Assuming you are taking the distcp approach you can mirror your
>> cluster with some scripting/coding. However your destination systems
>> can be more modest, assuming you wish to use it ONLY for data no job
>> processing:
>>
>
> And that would be a waste. (Why build a cloud just to store data and not do any processing?)
>
> You're not building your cloud in a vacuum. There are going to be SAN(s), other servers, tape??? available. The trick is getting the important data off the cloud to a place where it can be backed up via the corporation's standard IT practices.
>
> Because of the size of data, you may see people pulling data off the cloud in to a SAN, then to either a tape drive or a SATA Hot Swap Drive for off site storage.
> It all depends on the value of the data.
>
> Again, YMMV
>
> HTH
>
> -Mike
>
>

> You're not building your cloud in a vacuum. There are going to be SAN(s), other servers, tape??? available. The trick is getting the >important data off the cloud to a place where it can be backed up via the corporation's standard IT practices.

Right. it all depends on what you want and your needs. In my example I
wanted near line backups for a lot of data that I can recovery
quickly, thus a solution distcp to a second cluster.

If you want to integrate with other backup software you can do local
copying or experiment with fuse hadoop. Mount the drive and backup via
traditional methods (I just hope you have a lot of tapes :)