Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Best practice to migrate HDFS from 0.20.205 to CDH3u3


Copy link to this message
-
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3
Hi Austin,

I'm glad that helped out.  Regarding the -p flag for distcp, here's the online documentation

http://hadoop.apache.org/common/docs/current/distcp.html#Option+Index

You can also get this info from running 'hadoop distcp' without any flags.
--------
-p[rbugp]       Preserve
                       r: replication number
                       b: block size
                       u: user
                       g: group
                       p: permission
--------

-- Adam

On May 7, 2012, at 10:55 PM, Austin Chungath wrote:

> Thanks Adam,
>
> That was very helpful. Your second point solved my problems :-)
> The hdfs port number was wrong.
> I didn't use the option -ppgu what does it do?
>
>
>
> On Mon, May 7, 2012 at 8:07 PM, Adam Faris <[EMAIL PROTECTED]> wrote:
>
>> Hi Austin,
>>
>> I don't know about using CDH3, but we use distcp for moving data between
>> different versions of apache grids and several things come to mind.
>>
>> 1) you should use the -i flag to ignore checksum differences on the
>> blocks.  I'm not 100% but want to say hftp doesn't support checksums on the
>> blocks as they go across the wire.
>>
>> 2) you should read from hftp but write to hdfs.  Also make sure to check
>> your port numbers.   For example I can read from hftp on port 50070 and
>> write to hdfs on port 9000.  You'll find the hftp port in hdfs-site.xml and
>> hdfs in core-site.xml on apache releases.
>>
>> 3) Do you have security (kerberos) enabled on 0.20.205? Does CDH3 support
>> security?  If security is enabled on 0.20.205 and CDH3 does not support
>> security, you will need to disable security on 0.20.205.  This is because
>> you are unable to write from a secure to unsecured grid.
>>
>> 4) use the -m flag to limit your mappers so you don't DDOS your network
>> backbone.
>>
>> 5) why isn't your vender helping you with the data migration? :)
>>
>> Otherwise something like this should get you going.
>>
>> hadoop -i -ppgu -log /tmp/mylog -m 20 distcp
>> hftp://mynamenode.grid.one:50070/path/to/my/src/data
>> hdfs://mynamenode.grid.two:9000/path/to/my/dst
>>
>> -- Adam
>>
>> On May 7, 2012, at 4:29 AM, Nitin Pawar wrote:
>>
>>> things to check
>>>
>>> 1) when you launch distcp jobs all the datanodes of older hdfs are live
>> and
>>> connected
>>> 2) when you launch distcp no data is being written/moved/deleteed in hdfs
>>> 3)  you can use option -log to log errors into directory and user -i to
>>> ignore errors
>>>
>>> also u can try using distcp with hdfs protocol instead of hftp  ... for
>>> more you can refer
>>>
>> https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/d0d99ad9f1554edd
>>>
>>>
>>>
>>> if it failed there should be some error
>>> On Mon, May 7, 2012 at 4:44 PM, Austin Chungath <[EMAIL PROTECTED]>
>> wrote:
>>>
>>>> ok that was a lame mistake.
>>>> $ hadoop distcp hftp://localhost:50070/tmp
>> hftp://localhost:60070/tmp_copy
>>>> I had spelled hdfs instead of "hftp"
>>>>
>>>> $ hadoop distcp hftp://localhost:50070/docs/index.html
>>>> hftp://localhost:60070/user/hadoop
>>>> 12/05/07 16:38:09 INFO tools.DistCp:
>>>> srcPaths=[hftp://localhost:50070/docs/index.html]
>>>> 12/05/07 16:38:09 INFO tools.DistCp:
>>>> destPath=hftp://localhost:60070/user/hadoop
>>>> With failures, global counters are inaccurate; consider running with -i
>>>> Copy failed: java.io.IOException: Not supported
>>>> at org.apache.hadoop.hdfs.HftpFileSystem.delete(HftpFileSystem.java:457)
>>>> at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
>>>> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
>>>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>>>
>>>> Any idea why this error is coming?
>>>> I am copying one file from 0.20.205 (/docs/index.html ) to cdh3u3
>>>> (/user/hadoop)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB