Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: HDFS Backup for Hadoop Update


Copy link to this message
-
Re: HDFS Backup for Hadoop Update
Following the idea of doing a copy of the data structure I thought about
rsync.

I could run rsync while the server is ON and later just apply the diff,
which
would be much faster decreasing system off-line time.
But I do not know if hadoop make a lot of changes into the data
structure (blocks).

Thanks again,
Pablo

On 02/26/2013 07:39 PM, Pablo Musa wrote:
> Hello guys,
> I am starting the update from hadoop 0.20 to a newer version which changes
> HDFS format(2.0). I read a lot of tutorials and they say that data loss is
> possible (as expected). In order to avoid HDFS data loss I am will probably
> backup all HDFS structure (7TB per node). However, this is a huge amount
> of data and it will take a lot of time in which my service would be
> unavailable.
>
> I was thinking about a simple approach: copying all files to a different
> place.
> I tried to find some parallel files compactor to fasten the process, but
> could
> not find it.
>
> How do you guys did it?
> Is there some trick?
>
> Thank you in advance,
> Pablo Musa
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB