Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> S3N copy creating recursive folders


Copy link to this message
-
Re: S3N copy creating recursive folders
Hi Mike,

I have tries distcp as well and it ended up with exception:
13/03/06 05:41:13 INFO tools.DistCp: srcPaths=[s3n://acessKey:[EMAIL PROTECTED]et/srcData]
13/03/06 05:41:13 INFO tools.DistCp: destPath=/test/srcData
13/03/06 05:41:18 INFO tools.DistCp: /test/srcData does not exist.
org.apache.hadoop.tools.DistCp$DuplicationException: Invalid input, there are duplicated files in the sources: s3n://acessKey:[EMAIL PROTECTED]et/srcData/compressed, s3n://acessKey:[EMAIL PROTECTED]et/srcData/compressed
at org.apache.hadoop.tools.DistCp.checkDuplication(DistCp.java:1368)
at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1176)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

One more interesting stuff to notice is that same thing works nicely with hadoop 2.0

Cheers,
Subroto Sanyal
On Mar 6, 2013, at 11:12 AM, Michel Segel wrote:

> Have you tried using distcp?
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Mar 5, 2013, at 8:37 AM, Subroto <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> Its not because there are too many recursive folders in S3 bucket; in-fact there is no recursive folder in the source.
>> If I list the S3 bucket with Native S3 tools I can find a file srcData with size 0 in the folder srcData.
>> The copy command keeps on creating folder  /test/srcData/srcData/srcData (keep on appending srcData).
>>
>> Cheers,
>> Subroto Sanyal
>>
>> On Mar 5, 2013, at 3:32 PM, 卖报的小行家 wrote:
>>
>>> Hi Subroto,
>>>
>>> I didn't use the s3n filesystem.But  from the output "cp: java.io.IOException: mkdirs: Pathname too long.  Limit 8000 characters, 1000 levels.", I think this is because the problem of the path. Is the path longer than 8000 characters or the level is more than 1000?
>>> You only have 998 folders.Maybe the last one is more than 8000 characters.Why not count the last one's length?
>>>
>>> BRs//Julian
>>>
>>>
>>>
>>>
>>>
>>> ------------------ Original ------------------
>>> From:  "Subroto"<[EMAIL PROTECTED]>;
>>> Date:  Tue, Mar 5, 2013 10:22 PM
>>> To:  "user"<[EMAIL PROTECTED]>;
>>> Subject:  S3N copy creating recursive folders
>>>
>>> Hi,
>>>
>>> I am using Hadoop 1.0.3 and trying to execute:
>>> hadoop fs -cp s3n://acessKey:[EMAIL PROTECTED]/srcData" /test/srcData
>>>
>>> This ends up with:
>>> cp: java.io.IOException: mkdirs: Pathname too long.  Limit 8000 characters, 1000 levels.
>>>
>>> When I try to list the folder recursively /test/srcData: it lists 998 folders like:
>>> drwxr-xr-x   - root supergroup          0 2013-03-05 08:49 /test/srcData/srcData
>>> drwxr-xr-x   - root supergroup          0 2013-03-05 08:49 /test/srcData/srcData/srcData
>>> drwxr-xr-x   - root supergroup          0 2013-03-05 08:49 /test/srcData/srcData/srcData/srcData
>>> drwxr-xr-x   - root supergroup          0 2013-03-05 08:49 /test/srcData/srcData/srcData/srcData/srcData
>>> drwxr-xr-x   - root supergroup          0 2013-03-05 08:49 /test/srcData/srcData/srcData/srcData/srcData/srcData
>>>
>>> Is there a problem with s3n filesystem ??
>>>
>>> Cheers,
>>> Subroto Sanyal
>>

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB