Sorry at 1PB of disk... compression isn't going to really help a whole heck of a lot. Your networking bandwidth will be your bottleneck.
So lets look at the problem.
How much down time can you afford?
What does your hardware look like?
How much space do you have in your current data center?
You have 1PB of data. OK, what does the access pattern look like?
There are a couple of ways to slice and dice this. How many trucks do you have?
On Aug 3, 2012, at 4:24 PM, Harit Himanshu <[EMAIL PROTECTED]> wrote:
> Moving 1 PB of data would take loads of time,
> - check if this new data center provides something similar to http://aws.amazon.com/importexport/
> - Consider multi part uploading of data
> - consider compressing the data
> On Aug 3, 2012, at 2:19 PM, Patai Sangbutsarakum wrote:
>> thanks for response.
>> Physical move is not a choice in this case. Purely looking for copying
>> data and how to catch up with the update of a file while it is being
>> On Fri, Aug 3, 2012 at 12:40 PM, Chen He <[EMAIL PROTECTED]> wrote:
>>> sometimes, physically moving hard drives helps. :)
>>> On Aug 3, 2012 1:50 PM, "Patai Sangbutsarakum" <[EMAIL PROTECTED]>
>>>> Hi Hadoopers,
>>>> We have a plan to migrate Hadoop cluster to a different datacenter
>>>> where we can triple the size of the cluster.
>>>> Currently, our 0.20.2 cluster have around 1PB of data. We use only
>>>> I would like to get some input how we gonna handle with transferring
>>>> 1PB of data to a new site, and also keep up with
>>>> new files that thrown into cluster all the time.
>>>> Happy friday !!