Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> distcp in Hadoop 2.0.4 over http?


Copy link to this message
-
distcp in Hadoop 2.0.4 over http?
I want to copy HDFS filese over HTTP using distcp, but I can't. It is a
problem of configuration that I can't find it. How can I do distcp in
Hadoop 2.0.4 over HTTP?

First I set up hadoop 2.0.4 over http - Httpfs - on port 3888, which is
running. Here is the proof:

$ curl -i http://zk1.host.com:3888?user.name=babu&op=homedir
[1] 32129
[myuser@zk1 hadoop]$ HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Accept-Ranges: bytes
ETag: W/"674-1365802990000"
Last-Modified: Fri, 12 Apr 2013 21:43:10 GMT
Content-Type: text/html
Content-Length: 674
Date: Sat, 01 Jun 2013 15:48:04 GMT

<?xml version="1.0" encoding="UTF-8"?>
<html>
<body>
<b>HttpFs service</b>, service base URL at /webhdfs/v1.
</body>
</html>
But, when I do distcp, I can't copy:
$ hadoop distcp  http://zk1.host:3888/gutenberg/a.txt http://zk1.host:3888/
Warning: $HADOOP_HOME is deprecated.
Copy failed: java.io.IOException: No FileSystem for scheme: http
    at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1434)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1455)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:635)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

$ hadoop distcp  httpfs://zk1.host:3888/gutenberg/a.txt
httpfs://zk1.host:3888/
Copy failed: java.io.IOException: No FileSystem for scheme: httpfs
    at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1434)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1455)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:635)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

$ hadoop distcp  hdfs://zk1.host3888/gutenberg/a.txt hdfs://zk1.host:3888/
Copy failed: java.io.IOException: Call to
zk1.host/127.0.0.1:3888<http://zk1.yrl.gq1.yahoo.com/98.137.30.10:3888>failed
on local exception: java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
    at org.apache.hadoop.ipc.Client.call(Client.java:1112)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
    at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)

Here is my core-site files and httpfs-env.sh where I configured HDFS and
the HTTPFS:
$ cat etc/hadoop/core-site.xml
<configuration>
  <property> <name>fs.default.name</name>
<value>hdfs://zk1.host:9000</value> </property>
  <property<name>hadoop.proxyuser.myuser.hosts</name<value>zk1.host</value>
</property>
  <property> <name>hadoop.proxyuser.myuser.groups</name> <value>*</value>
  </property>
</configuration>

$ cat etc/hadoop/httpfs-env.sh
#!/bin/bash
export HTTPFS_HTTP_PORT=3888
export HTTPFS_HTTP_HOSTNAME=`hostname -f`
--
Best regards,
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB