Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Best practice to migrate HDFS from 0.20.205 to CDH3u3


Copy link to this message
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3
Hi Austin,

I'm glad that helped out.  Regarding the -p flag for distcp, here's the online documentation

http://hadoop.apache.org/common/docs/current/distcp.html#Option+Index

You can also get this info from running 'hadoop distcp' without any flags.
--------
-p[rbugp]       Preserve
                       r: replication number
                       b: block size
                       u: user
                       g: group
                       p: permission
--------

-- Adam

On May 7, 2012, at 10:55 PM, Austin Chungath wrote:

> Thanks Adam,
>
> That was very helpful. Your second point solved my problems :-)
> The hdfs port number was wrong.
> I didn't use the option -ppgu what does it do?
>
>
>
> On Mon, May 7, 2012 at 8:07 PM, Adam Faris <[EMAIL PROTECTED]> wrote:
>
>> Hi Austin,
>>
>> I don't know about using CDH3, but we use distcp for moving data between
>> different versions of apache grids and several things come to mind.
>>
>> 1) you should use the -i flag to ignore checksum differences on the
>> blocks.  I'm not 100% but want to say hftp doesn't support checksums on the
>> blocks as they go across the wire.
>>
>> 2) you should read from hftp but write to hdfs.  Also make sure to check
>> your port numbers.   For example I can read from hftp on port 50070 and
>> write to hdfs on port 9000.  You'll find the hftp port in hdfs-site.xml and
>> hdfs in core-site.xml on apache releases.
>>
>> 3) Do you have security (kerberos) enabled on 0.20.205? Does CDH3 support
>> security?  If security is enabled on 0.20.205 and CDH3 does not support
>> security, you will need to disable security on 0.20.205.  This is because
>> you are unable to write from a secure to unsecured grid.
>>
>> 4) use the -m flag to limit your mappers so you don't DDOS your network
>> backbone.
>>
>> 5) why isn't your vender helping you with the data migration? :)
>>
>> Otherwise something like this should get you going.
>>
>> hadoop -i -ppgu -log /tmp/mylog -m 20 distcp
>> hftp://mynamenode.grid.one:50070/path/to/my/src/data
>> hdfs://mynamenode.grid.two:9000/path/to/my/dst
>>
>> -- Adam
>>
>> On May 7, 2012, at 4:29 AM, Nitin Pawar wrote:
>>
>>> things to check
>>>
>>> 1) when you launch distcp jobs all the datanodes of older hdfs are live
>> and
>>> connected
>>> 2) when you launch distcp no data is being written/moved/deleteed in hdfs
>>> 3)  you can use option -log to log errors into directory and user -i to
>>> ignore errors
>>>
>>> also u can try using distcp with hdfs protocol instead of hftp  ... for
>>> more you can refer
>>>
>> https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/d0d99ad9f1554edd
>>>
>>>
>>>
>>> if it failed there should be some error
>>> On Mon, May 7, 2012 at 4:44 PM, Austin Chungath <[EMAIL PROTECTED]>
>> wrote:
>>>
>>>> ok that was a lame mistake.
>>>> $ hadoop distcp hftp://localhost:50070/tmp
>> hftp://localhost:60070/tmp_copy
>>>> I had spelled hdfs instead of "hftp"
>>>>
>>>> $ hadoop distcp hftp://localhost:50070/docs/index.html
>>>> hftp://localhost:60070/user/hadoop
>>>> 12/05/07 16:38:09 INFO tools.DistCp:
>>>> srcPaths=[hftp://localhost:50070/docs/index.html]
>>>> 12/05/07 16:38:09 INFO tools.DistCp:
>>>> destPath=hftp://localhost:60070/user/hadoop
>>>> With failures, global counters are inaccurate; consider running with -i
>>>> Copy failed: java.io.IOException: Not supported
>>>> at org.apache.hadoop.hdfs.HftpFileSystem.delete(HftpFileSystem.java:457)
>>>> at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
>>>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
>>>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>>>
>>>> Any idea why this error is coming?
>>>> I am copying one file from 0.20.205 (/docs/index.html ) to cdh3u3
>>>> (/user/hadoop)