Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: copy files from ftp to hdfs in parallel, distcp failed


Copy link to this message
-
Re: copy files from ftp to hdfs in parallel, distcp failed
Hi,

I am just wondering whether I can move data from Ftp to Hdfs via Hadoop
distcp.

Can someone give me an example ?

In my case, I always encounter the "can not access ftp" error.

I am quite sure that the link, login et passwd are correct, actually, I
have just copy and paste the ftp address to Firefox. It does work.
However,//it doesn't work with:
bin/hadoop -ls ftp://<my ftp location>

Any workaround here ?

Thank you.

Hao

Le 16/07/2013 17:47, Hao Ren a �crit :
> Hi,
>
> Actually, I test with my own ftp host at first, however it doesn't work.
>
> Then I changed it into 0.0.0.0.
>
> But I always get the "can not access ftp" msg.
>
> Thank you .
>
> Hao.
>
> Le 16/07/2013 17:03, Ram a �crit :
>> Hi,
>>     Please replace 0.0.0.0.with your ftp host ip address and try it.
>>
>> Hi,
>>
>>
>>
>> From,
>> Ramesh.
>>
>>
>>
>>
>> On Mon, Jul 15, 2013 at 3:22 PM, Hao Ren <[EMAIL PROTECTED]
>> <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>     Thank you, Ram
>>
>>     I have configured core-site.xml as following:
>>
>>     <?xml version="1.0"?>
>>     <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>
>>     <!-- Put site-specific property overrides in this file. -->
>>
>>     <configuration>
>>
>>         <property>
>>             <name>hadoop.tmp.dir</name>
>>     <value>/vol/persistent-hdfs</value>
>>         </property>
>>
>>         <property>
>>             <name>fs.default.name <http://fs.default.name></name>
>>            
>>     <value>hdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010
>>     <http://ec2-23-23-33-234.compute-1.amazonaws.com:9010></value>
>>         </property>
>>
>>         <property>
>>             <name>io.file.buffer.size</name>
>>             <value>65536</value>
>>         </property>
>>
>>         <property>
>>             <name>fs.ftp.host</name>
>>             <value>0.0.0.0</value>
>>         </property>
>>
>>         <property>
>>             <name>fs.ftp.host.port</name>
>>             <value>21</value>
>>         </property>
>>
>>     </configuration>
>>
>>     Then I tried  hadoop fs -ls file:/// , it works.
>>     But hadoop fs -ls ftp://<login>:<password>@<ftp server
>>     ip>/<directory>/ doesn't work as usual:
>>         ls: Cannot access ftp://<user>:<password>@<ftp server
>>     ip>/<directory>/: No such file or directory.
>>
>>     When ignoring <directroy> as :
>>
>>     hadoop fs -ls ftp://<login>:<password>@<ftp server ip>/
>>
>>     There are no error msgs, but it lists nothing.
>>
>>
>>     I have also check the rights for my /home/<user> directroy:
>>
>>     drwxr-xr-x 11 <user> <user>  4096 jui 11 16:30 <user>
>>
>>     and all the files under /home/<user> have rights 755.
>>
>>     I can easily copy the link ftp://<user>:<password>@<ftp server
>>     ip>/<directory>/ to firefox, it lists all the files as expected.
>>
>>     Any workaround here ?
>>
>>     Thank you.
>>
>>     Le 12/07/2013 14:01, Ram a �crit :
>>>     Please configure the following in core-ste.xml and try.
>>>        Use hadoop fs -ls file:///  -- to display local file system files
>>>        Use hadoop fs -ls ftp://<your ftp location>   -- to display
>>>     ftp files if it is listing files go for distcp.
>>>
>>>     reference from
>>>     http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
>>>
>>>     fs.ftp.host 0.0.0.0 FTP filesystem connects to this server
>>>     fs.ftp.host.port 21 FTP filesystem connects to fs.ftp.host on
>>>     this port
>>>
>>
>>
>>     --
>>     Hao Ren
>>     ClaraVista
>>     www.claravista.fr  <http://www.claravista.fr>
>>
>>
>
>
> --
> Hao Ren
> ClaraVista
> www.claravista.fr
--
Hao Ren
ClaraVista
www.claravista.fr

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB