Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Regarding: Merging two hadoop clusters


Copy link to this message
-
Re: Regarding: Merging two hadoop clusters
Jean-Marc Spaggiari 2013-03-14, 11:52
Like Vivek and Vinod are saying, using distcp might be the solution.

But you need to make sure you have enought free space in one cluster
to received the data from the other one. If you don't have, you might
need to re-assign some on the nodes from one cluster to the other one
to "concentrate" the free space on a singlue cluster, then to the
migration? That was you will not have any down-time. But you need to
make sure nother is written on the cluster you are migrating to not
miss any data.

JM

2013/3/14 vivek <[EMAIL PROTECTED]>:
> Distcp is the simplest approach you can use (it will copy data parallely
> using mappers and reducers).
>
>
>
> On Thu, Mar 14, 2013 at 12:16 PM, Vinod Kumar Vavilapalli
> <[EMAIL PROTECTED]> wrote:
>>
>>
>> Copy data into one of the clusters using distcp *without* downtime
>> (assuming you have enough capacity) and then merge the clusters?
>>
>> Thanks,
>> +Vinod Kumar Vavilapalli
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>> On Mar 13, 2013, at 9:38 PM, Shashank Agarwal wrote:
>>
>> Hey Guys,
>>
>> I have two different hadoop clusters in production. One cluster is used as
>> backing for HBase and the other for other things. Both hadoop clusters are
>> using the same version 1.0 and I want to merge them and make them one. I
>> know, one possible solution is to copy the data across, but the data is
>> really huge on these clusters and it will hard for me to compromise with
>> huge downtime.
>> Is there any optimal way to merge two hadoop clusters.
>>
>> ~Shashank
>>
>>
>
>
>
> --
>
>
>
>
>
>
>
> Thanks and Regards,
>
> VIVEK KOUL