Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: copy files from ftp to hdfs in parallel, distcp failed


+
Hao Ren 2013-07-16, 15:47
Copy link to this message
-
Re: copy files from ftp to hdfs in parallel, distcp failed
Hi,

I am just wondering whether I can move data from Ftp to Hdfs via Hadoop
distcp.

Can someone give me an example ?

In my case, I always encounter the "can not access ftp" error.

I am quite sure that the link, login et passwd are correct, actually, I
have just copy and paste the ftp address to Firefox. It does work.
However,//it doesn't work with:
bin/hadoop -ls ftp://<my ftp location>

Any workaround here ?

Thank you.

Hao

Le 16/07/2013 17:47, Hao Ren a �crit :
> Hi,
>
> Actually, I test with my own ftp host at first, however it doesn't work.
>
> Then I changed it into 0.0.0.0.
>
> But I always get the "can not access ftp" msg.
>
> Thank you .
>
> Hao.
>
> Le 16/07/2013 17:03, Ram a �crit :
>> Hi,
>>     Please replace 0.0.0.0.with your ftp host ip address and try it.
>>
>> Hi,
>>
>>
>>
>> From,
>> Ramesh.
>>
>>
>>
>>
>> On Mon, Jul 15, 2013 at 3:22 PM, Hao Ren <[EMAIL PROTECTED]
>> <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>     Thank you, Ram
>>
>>     I have configured core-site.xml as following:
>>
>>     <?xml version="1.0"?>
>>     <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>
>>     <!-- Put site-specific property overrides in this file. -->
>>
>>     <configuration>
>>
>>         <property>
>>             <name>hadoop.tmp.dir</name>
>>     <value>/vol/persistent-hdfs</value>
>>         </property>
>>
>>         <property>
>>             <name>fs.default.name <http://fs.default.name></name>
>>            
>>     <value>hdfs://ec2-23-23-33-234.compute-1.amazonaws.com:9010
>>     <http://ec2-23-23-33-234.compute-1.amazonaws.com:9010></value>
>>         </property>
>>
>>         <property>
>>             <name>io.file.buffer.size</name>
>>             <value>65536</value>
>>         </property>
>>
>>         <property>
>>             <name>fs.ftp.host</name>
>>             <value>0.0.0.0</value>
>>         </property>
>>
>>         <property>
>>             <name>fs.ftp.host.port</name>
>>             <value>21</value>
>>         </property>
>>
>>     </configuration>
>>
>>     Then I tried  hadoop fs -ls file:/// , it works.
>>     But hadoop fs -ls ftp://<login>:<password>@<ftp server
>>     ip>/<directory>/ doesn't work as usual:
>>         ls: Cannot access ftp://<user>:<password>@<ftp server
>>     ip>/<directory>/: No such file or directory.
>>
>>     When ignoring <directroy> as :
>>
>>     hadoop fs -ls ftp://<login>:<password>@<ftp server ip>/
>>
>>     There are no error msgs, but it lists nothing.
>>
>>
>>     I have also check the rights for my /home/<user> directroy:
>>
>>     drwxr-xr-x 11 <user> <user>  4096 jui 11 16:30 <user>
>>
>>     and all the files under /home/<user> have rights 755.
>>
>>     I can easily copy the link ftp://<user>:<password>@<ftp server
>>     ip>/<directory>/ to firefox, it lists all the files as expected.
>>
>>     Any workaround here ?
>>
>>     Thank you.
>>
>>     Le 12/07/2013 14:01, Ram a �crit :
>>>     Please configure the following in core-ste.xml and try.
>>>        Use hadoop fs -ls file:///  -- to display local file system files
>>>        Use hadoop fs -ls ftp://<your ftp location>   -- to display
>>>     ftp files if it is listing files go for distcp.
>>>
>>>     reference from
>>>     http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
>>>
>>>     fs.ftp.host 0.0.0.0 FTP filesystem connects to this server
>>>     fs.ftp.host.port 21 FTP filesystem connects to fs.ftp.host on
>>>     this port
>>>
>>
>>
>>     --
>>     Hao Ren
>>     ClaraVista
>>     www.claravista.fr  <http://www.claravista.fr>
>>
>>
>
>
> --
> Hao Ren
> ClaraVista
> www.claravista.fr
--
Hao Ren
ClaraVista
www.claravista.fr