Following the idea of doing a copy of the data structure I thought about
I could run rsync while the server is ON and later just apply the diff,
would be much faster decreasing system off-line time.
But I do not know if hadoop make a lot of changes into the data
On 02/26/2013 07:39 PM, Pablo Musa wrote:
> Hello guys,
> I am starting the update from hadoop 0.20 to a newer version which changes
> HDFS format(2.0). I read a lot of tutorials and they say that data loss is
> possible (as expected). In order to avoid HDFS data loss I am will probably
> backup all HDFS structure (7TB per node). However, this is a huge amount
> of data and it will take a lot of time in which my service would be
> I was thinking about a simple approach: copying all files to a different
> I tried to find some parallel files compactor to fasten the process, but
> not find it.
> How do you guys did it?
> Is there some trick?
> Thank you in advance,
> Pablo Musa