Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # dev - issues with distcp copying from 1.0 to 2.0

Copy link to this message
issues with distcp copying from 1.0 to 2.0
Sangjin Lee 2013-08-31, 02:24
This may have been discussed in the past, but I haven't been able to find

It seems as though much work has been done to make distcp from 1.0 to 2.0
work with checksum enabled (
https://issues.apache.org/jira/browse/HADOOP-8060). And I do see all the
work has been merged to the 2.0 releases. However, it seems that distcp
from 1.0 to 2.0 still doesn't work if the CRC check is enabled. Is that a
correct understanding?

I took a quick look at the distcp code (mostly around CopyMapper and
RetriableFileCopyCommand), and I don't see how the source checksum type is
passed into creating the file with DFSClient. And also it doesn't look like
dfs.checksum.type is being set upon discovering the source checksum type
(which would have been another mechanism). And this is consistent with my
testing. And I can also confirm that it works if I pass in command line
option "-Ddfs.checksum.type=CRC32".

Is this understanding accurate? If so, is there a reason this was not done
in distcp? Curious...