Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Best practice to migrate HDFS from 0.20.205 to CDH3u3


Copy link to this message
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3
things to check

1) when you launch distcp jobs all the datanodes of older hdfs are live and
connected
2) when you launch distcp no data is being written/moved/deleteed in hdfs
3)  you can use option -log to log errors into directory and user -i to
ignore errors

also u can try using distcp with hdfs protocol instead of hftp  ... for
more you can refer
https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/d0d99ad9f1554edd

if it failed there should be some error
On Mon, May 7, 2012 at 4:44 PM, Austin Chungath <[EMAIL PROTECTED]> wrote:

> ok that was a lame mistake.
> $ hadoop distcp hftp://localhost:50070/tmp hftp://localhost:60070/tmp_copy
> I had spelled hdfs instead of "hftp"
>
> $ hadoop distcp hftp://localhost:50070/docs/index.html
> hftp://localhost:60070/user/hadoop
> 12/05/07 16:38:09 INFO tools.DistCp:
> srcPaths=[hftp://localhost:50070/docs/index.html]
> 12/05/07 16:38:09 INFO tools.DistCp:
> destPath=hftp://localhost:60070/user/hadoop
> With failures, global counters are inaccurate; consider running with -i
> Copy failed: java.io.IOException: Not supported
> at org.apache.hadoop.hdfs.HftpFileSystem.delete(HftpFileSystem.java:457)
> at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>
> Any idea why this error is coming?
> I am copying one file from 0.20.205 (/docs/index.html ) to cdh3u3
> (/user/hadoop)
>
> Thanks & Regards,
> Austin
>
> On Mon, May 7, 2012 at 3:57 PM, Austin Chungath <[EMAIL PROTECTED]>
> wrote:
>
> > Thanks,
> >
> > So I decided to try and move using distcp.
> >
> > $ hadoop distcp hdfs://localhost:54310/tmp hdfs://localhost:8021/tmp_copy
> > 12/05/07 14:57:38 INFO tools.DistCp:
> srcPaths=[hdfs://localhost:54310/tmp]
> > 12/05/07 14:57:38 INFO tools.DistCp:
> > destPath=hdfs://localhost:8021/tmp_copy
> > With failures, global counters are inaccurate; consider running with -i
> > Copy failed: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol
> > org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client
> > > 63, server = 61)
> >
> > I found that we can do distcp like above only if both are of the same
> > hadoop version.
> > so I tried:
> >
> > $ hadoop distcp hftp://localhost:50070/tmp
> hdfs://localhost:60070/tmp_copy
> > 12/05/07 15:02:44 INFO tools.DistCp:
> srcPaths=[hftp://localhost:50070/tmp]
> > 12/05/07 15:02:44 INFO tools.DistCp:
> > destPath=hdfs://localhost:60070/tmp_copy
> >
> > But this process seemed to be hangs at this stage. What might I be doing
> > wrong?
> >
> > hftp://<dfs.http.address>/<path>
> > hftp://localhost:50070 is dfs.http.address of 0.20.205
> > hdfs://localhost:60070 is dfs.http.address of cdh3u3
> >
> > Thanks and regards,
> > Austin
> >
> >
> > On Fri, May 4, 2012 at 4:30 AM, Michel Segel <[EMAIL PROTECTED]
> >wrote:
> >
> >> Ok... So riddle me this...
> >> I currently have a replication factor of 3.
> >> I reset it to two.
> >>
> >> What do you have to do to get the replication factor of 3 down to 2?
> >> Do I just try to rebalance the nodes?
> >>
> >> The point is that you are looking at a very small cluster.
> >> You may want to start the be cluster with a replication factor of 2 and
> >> then when the data is moved over, increase it to a factor of 3. Or maybe
> >> not.
> >>
> >> I do a distcp to. Copy the data and after each distcp, I do an fsck for
> a
> >> sanity check and then remove the files I copied. As I gain more room, I
> can
> >> then slowly drop nodes, do an fsck, rebalance and then repeat.
> >>
> >> Even though this us a dev cluster, the OP wants to retain the data.
> >>
> >> There are other options depending on the amount and size of new
> hardware.
> >> I mean make one machine a RAID 5 machine, copy data to it clearing off

Nitin Pawar