Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Best practice to migrate HDFS from 0.20.205 to CDH3u3


Copy link to this message
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3
Hi Austin,

I don't know about using CDH3, but we use distcp for moving data between different versions of apache grids and several things come to mind.

1) you should use the -i flag to ignore checksum differences on the blocks.  I'm not 100% but want to say hftp doesn't support checksums on the blocks as they go across the wire.

2) you should read from hftp but write to hdfs.  Also make sure to check your port numbers.   For example I can read from hftp on port 50070 and write to hdfs on port 9000.  You'll find the hftp port in hdfs-site.xml and hdfs in core-site.xml on apache releases.

3) Do you have security (kerberos) enabled on 0.20.205? Does CDH3 support security?  If security is enabled on 0.20.205 and CDH3 does not support security, you will need to disable security on 0.20.205.  This is because you are unable to write from a secure to unsecured grid.

4) use the -m flag to limit your mappers so you don't DDOS your network backbone.  

5) why isn't your vender helping you with the data migration? :)  

Otherwise something like this should get you going.

hadoop -i -ppgu -log /tmp/mylog -m 20 distcp hftp://mynamenode.grid.one:50070/path/to/my/src/data hdfs://mynamenode.grid.two:9000/path/to/my/dst

-- Adam

On May 7, 2012, at 4:29 AM, Nitin Pawar wrote:

> things to check
>
> 1) when you launch distcp jobs all the datanodes of older hdfs are live and
> connected
> 2) when you launch distcp no data is being written/moved/deleteed in hdfs
> 3)  you can use option -log to log errors into directory and user -i to
> ignore errors
>
> also u can try using distcp with hdfs protocol instead of hftp  ... for
> more you can refer
> https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/d0d99ad9f1554edd
>
>
>
> if it failed there should be some error
> On Mon, May 7, 2012 at 4:44 PM, Austin Chungath <[EMAIL PROTECTED]> wrote:
>
>> ok that was a lame mistake.
>> $ hadoop distcp hftp://localhost:50070/tmp hftp://localhost:60070/tmp_copy
>> I had spelled hdfs instead of "hftp"
>>
>> $ hadoop distcp hftp://localhost:50070/docs/index.html
>> hftp://localhost:60070/user/hadoop
>> 12/05/07 16:38:09 INFO tools.DistCp:
>> srcPaths=[hftp://localhost:50070/docs/index.html]
>> 12/05/07 16:38:09 INFO tools.DistCp:
>> destPath=hftp://localhost:60070/user/hadoop
>> With failures, global counters are inaccurate; consider running with -i
>> Copy failed: java.io.IOException: Not supported
>> at org.apache.hadoop.hdfs.HftpFileSystem.delete(HftpFileSystem.java:457)
>> at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>
>> Any idea why this error is coming?
>> I am copying one file from 0.20.205 (/docs/index.html ) to cdh3u3
>> (/user/hadoop)
>>
>> Thanks & Regards,
>> Austin
>>
>> On Mon, May 7, 2012 at 3:57 PM, Austin Chungath <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Thanks,
>>>
>>> So I decided to try and move using distcp.
>>>
>>> $ hadoop distcp hdfs://localhost:54310/tmp hdfs://localhost:8021/tmp_copy
>>> 12/05/07 14:57:38 INFO tools.DistCp:
>> srcPaths=[hdfs://localhost:54310/tmp]
>>> 12/05/07 14:57:38 INFO tools.DistCp:
>>> destPath=hdfs://localhost:8021/tmp_copy
>>> With failures, global counters are inaccurate; consider running with -i
>>> Copy failed: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol
>>> org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client
>> >>> 63, server = 61)
>>>
>>> I found that we can do distcp like above only if both are of the same
>>> hadoop version.
>>> so I tried:
>>>
>>> $ hadoop distcp hftp://localhost:50070/tmp
>> hdfs://localhost:60070/tmp_copy
>>> 12/05/07 15:02:44 INFO tools.DistCp:
>> srcPaths=[hftp://localhost:50070/tmp]
>>> 12/05/07 15:02:44 INFO tools.DistCp:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB