Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # dev >> migration from hadoop cluster cdh3 to cdh4


+
Shengjie Min 2012-12-07, 13:27
Copy link to this message
-
Re: migration from hadoop cluster cdh3 to cdh4
Hi Shengjie,

This question is specific to CDH and hence does not belong to the Apache
HDFS development lists (Which is for HDFS project developers). I've hence
moved your question to CDH's own user lists [EMAIL PROTECTED] (
https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!forum/cdh-user
).

My answers inline.
On Fri, Dec 7, 2012 at 6:57 PM, Shengjie Min <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Is there any instructions or documents covering migration from hadoop hdfs
> cdh3 to cdh4 since all the docs I found are talking about in place
> upgrading ONLY?
>

You are correct that at present there is no migration guide. I'll reach out
to the docs team behind the site to add one in as it may be helpful to
others too.
> I have two hadoop clusters, My target is to use hadoop -cp to copy all the
> hdfs files from *cluster1* to*cluster2*
>
> *Cluster1:* Hadoop 0.20.2-cdh3u4
>
> *Cluster2:* Hadoop 2.0.0-cdh4.1.1
>
> Now, even just running dfs -ls command against *cluster1* remotely on *
> cluster2* as below:
>
> hadoop fs -ls hdfs://cluster1-namenode:8020/hbase
>

Using regular FS commands (using hdfs:// Scheme) between CDH3 and CDH4 will
not work as both have different protocol versions (and are incompatible
with one another over regular RPC calls). It is normal to see the exception
you got there when you attempt this.
> I think it's due to the hadoop version difference. In my case, cdh3 cluster
> doesn't have mapred deployed which rules out all the distcp, bhase
> copytable options. And the hbase replication ability is not available on
> cdh3 cluster neither. I am struggling to think of a way to migrate the hdfs
> data from *cluster1* to *cluster2.*
>
>
HDFS provides a DistCp tool that lets you do this. It leverages mapreduce
to run in a fast manner, and copies provided paths completely. DistCp can
also leverage the HFTP file system (hftp://) that is exposed by HDFS over
the web server (Simple HTTP based HDFS access)

You can invoke on your CDH4 HDFS cluster the following command for more
options:

$ hadoop distcp

What you may probably need is:

$ hadoop distcp hftp://cdh3-namenode:50070/<path to copy> <destination on
CDH4>
> --
> All the best,
> Shengjie Min
>

--
Harsh J