Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Hadoop with S3 instead of local storage


+
Alok Kumar 2012-08-02, 07:14
+
Harsh J 2012-08-02, 11:52
+
Alok Kumar 2012-08-02, 13:01
Copy link to this message
-
Re: Hadoop with S3 instead of local storage
Alok,

HDFS is a FileSystem. S3 is also a FileSystem. Hence when you choose
to use S3 on a node, do not attempt to start HDFS services such as
NameNode and DataNode. They have nothing to do with S3. S3 stands
alone and its configuration points to where it is running / how it is
to be accessed / etc.. For S3 to be available, the S3's jars should be
made available in services you wish to use it in.

Yes you can make Hive/HBase work with S3, if S3 is configured as the
fs.default.name (or fs.defaultFS in 2.x+). You can configure your
core-site.xml with the right FS, and run regular "hadoop fs -ls /",
etc. commands against that FS. The library is jets3t:
http://jets3t.s3.amazonaws.com/downloads.html and you'll need its jar
on HBase/Hive/etc. classpaths.

Let us know if this clears it up!

On Thu, Aug 2, 2012 at 6:31 PM, Alok Kumar <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Thank you for reply.
>
> Requirement is that I need to setup a hadoop cluster using s3 as a backup
> (performance won't be an issue)
>
> My Architecture is like  :
> Hive has external table mapped to HBase. HBase is storing data to HDFS.
> Hive is using Hadoop to access HBase table data.
> Can I make this work using S3?
>
> HBase regionserver is failing with Error "Caused by:
> java.lang.ClassNotFoundException: org.jets3t.service.S3ServiceException"
>
> HBase master log has lots of  "Unexpected response code 404, expected 200"
>
> Do I need to start DataNode with s3?
> Datanode log says :
>
> 2012-08-02 17:50:20,021 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG:   host = datarpm-desktop/192.168.2.4
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 1.0.1
> STARTUP_MSG:   build > https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
> 1243785; compiled by 'hortonfo' on Tue Feb 14 08:15:38 UTC 2012
> ************************************************************/
> 2012-08-02 17:50:20,145 INFO org.apache.hadoop.metrics2.impl.MetricsConfig:
> loaded properties from hadoop-metrics2.properties
> 2012-08-02 17:50:20,156 INFO
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
> MetricsSystem,sub=Stats registered.
> 2012-08-02 17:50:20,157 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period
> at 10 second(s).
> 2012-08-02 17:50:20,157 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system
> started
> 2012-08-02 17:50:20,277 INFO
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi
> registered.
> 2012-08-02 17:50:20,281 WARN
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
> exists!
> 2012-08-02 17:50:20,317 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded
> the native-hadoop library
> 2012-08-02 17:50:22,006 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call
> to <bucket-name>/67.215.65.132:8020 failed on local exception:
> java.io.EOFException
>     at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1071)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>     at $Proxy5.getProtocolVersion(Unknown Source)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:370)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:429)
>     at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:331)
>     at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:296)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:356)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)

Harsh J
+
Alok Kumar 2012-08-03, 07:26
+
Harsh J 2012-08-03, 07:30
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB