Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Best practice to migrate HDFS from 0.20.205 to CDH3u3


Copy link to this message
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3
Nitin Pawar 2012-05-07, 11:29
things to check

1) when you launch distcp jobs all the datanodes of older hdfs are live and
connected
2) when you launch distcp no data is being written/moved/deleteed in hdfs
3)  you can use option -log to log errors into directory and user -i to
ignore errors

also u can try using distcp with hdfs protocol instead of hftp  ... for
more you can refer
https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/d0d99ad9f1554edd

if it failed there should be some error
On Mon, May 7, 2012 at 4:44 PM, Austin Chungath <[EMAIL PROTECTED]> wrote:

> ok that was a lame mistake.
> $ hadoop distcp hftp://localhost:50070/tmp hftp://localhost:60070/tmp_copy
> I had spelled hdfs instead of "hftp"
>
> $ hadoop distcp hftp://localhost:50070/docs/index.html
> hftp://localhost:60070/user/hadoop
> 12/05/07 16:38:09 INFO tools.DistCp:
> srcPaths=[hftp://localhost:50070/docs/index.html]
> 12/05/07 16:38:09 INFO tools.DistCp:
> destPath=hftp://localhost:60070/user/hadoop
> With failures, global counters are inaccurate; consider running with -i
> Copy failed: java.io.IOException: Not supported
> at org.apache.hadoop.hdfs.HftpFileSystem.delete(HftpFileSystem.java:457)
> at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>
> Any idea why this error is coming?
> I am copying one file from 0.20.205 (/docs/index.html ) to cdh3u3
> (/user/hadoop)
>
> Thanks & Regards,
> Austin
>
> On Mon, May 7, 2012 at 3:57 PM, Austin Chungath <[EMAIL PROTECTED]>
> wrote:
>
> > Thanks,
> >
> > So I decided to try and move using distcp.
> >
> > $ hadoop distcp hdfs://localhost:54310/tmp hdfs://localhost:8021/tmp_copy
> > 12/05/07 14:57:38 INFO tools.DistCp:
> srcPaths=[hdfs://localhost:54310/tmp]
> > 12/05/07 14:57:38 INFO tools.DistCp:
> > destPath=hdfs://localhost:8021/tmp_copy
> > With failures, global counters are inaccurate; consider running with -i
> > Copy failed: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol
> > org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client
> > > 63, server = 61)
> >
> > I found that we can do distcp like above only if both are of the same
> > hadoop version.
> > so I tried:
> >
> > $ hadoop distcp hftp://localhost:50070/tmp
> hdfs://localhost:60070/tmp_copy
> > 12/05/07 15:02:44 INFO tools.DistCp:
> srcPaths=[hftp://localhost:50070/tmp]
> > 12/05/07 15:02:44 INFO tools.DistCp:
> > destPath=hdfs://localhost:60070/tmp_copy
> >
> > But this process seemed to be hangs at this stage. What might I be doing
> > wrong?
> >
> > hftp://<dfs.http.address>/<path>
> > hftp://localhost:50070 is dfs.http.address of 0.20.205
> > hdfs://localhost:60070 is dfs.http.address of cdh3u3
> >
> > Thanks and regards,
> > Austin
> >
> >
> > On Fri, May 4, 2012 at 4:30 AM, Michel Segel <[EMAIL PROTECTED]
> >wrote:
> >
> >> Ok... So riddle me this...
> >> I currently have a replication factor of 3.
> >> I reset it to two.
> >>
> >> What do you have to do to get the replication factor of 3 down to 2?
> >> Do I just try to rebalance the nodes?
> >>
> >> The point is that you are looking at a very small cluster.
> >> You may want to start the be cluster with a replication factor of 2 and
> >> then when the data is moved over, increase it to a factor of 3. Or maybe
> >> not.
> >>
> >> I do a distcp to. Copy the data and after each distcp, I do an fsck for
> a
> >> sanity check and then remove the files I copied. As I gain more room, I
> can
> >> then slowly drop nodes, do an fsck, rebalance and then repeat.
> >>
> >> Even though this us a dev cluster, the OP wants to retain the data.
> >>
> >> There are other options depending on the amount and size of new
> hardware.
> >> I mean make one machine a RAID 5 machine, copy data to it clearing off

Nitin Pawar