Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Re: HDFS Backup for Hadoop Update


Copy link to this message
-
Re: HDFS Backup for Hadoop Update
Pablo Musa 2013-02-26, 23:53
Following the idea of doing a copy of the data structure I thought about
rsync.

I could run rsync while the server is ON and later just apply the diff,
which
would be much faster decreasing system off-line time.
But I do not know if hadoop make a lot of changes into the data
structure (blocks).

Thanks again,
Pablo

On 02/26/2013 07:39 PM, Pablo Musa wrote:
> Hello guys,
> I am starting the update from hadoop 0.20 to a newer version which changes
> HDFS format(2.0). I read a lot of tutorials and they say that data loss is
> possible (as expected). In order to avoid HDFS data loss I am will probably
> backup all HDFS structure (7TB per node). However, this is a huge amount
> of data and it will take a lot of time in which my service would be
> unavailable.
>
> I was thinking about a simple approach: copying all files to a different
> place.
> I tried to find some parallel files compactor to fasten the process, but
> could
> not find it.
>
> How do you guys did it?
> Is there some trick?
>
> Thank you in advance,
> Pablo Musa