Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Hadoop with S3 instead of local storage


Copy link to this message
-
Re: Hadoop with S3 instead of local storage
Harsh J 2012-08-02, 17:48
Alok,

HDFS is a FileSystem. S3 is also a FileSystem. Hence when you choose
to use S3 on a node, do not attempt to start HDFS services such as
NameNode and DataNode. They have nothing to do with S3. S3 stands
alone and its configuration points to where it is running / how it is
to be accessed / etc.. For S3 to be available, the S3's jars should be
made available in services you wish to use it in.

Yes you can make Hive/HBase work with S3, if S3 is configured as the
fs.default.name (or fs.defaultFS in 2.x+). You can configure your
core-site.xml with the right FS, and run regular "hadoop fs -ls /",
etc. commands against that FS. The library is jets3t:
http://jets3t.s3.amazonaws.com/downloads.html and you'll need its jar
on HBase/Hive/etc. classpaths.

Let us know if this clears it up!

On Thu, Aug 2, 2012 at 6:31 PM, Alok Kumar <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Thank you for reply.
>
> Requirement is that I need to setup a hadoop cluster using s3 as a backup
> (performance won't be an issue)
>
> My Architecture is like  :
> Hive has external table mapped to HBase. HBase is storing data to HDFS.
> Hive is using Hadoop to access HBase table data.
> Can I make this work using S3?
>
> HBase regionserver is failing with Error "Caused by:
> java.lang.ClassNotFoundException: org.jets3t.service.S3ServiceException"
>
> HBase master log has lots of  "Unexpected response code 404, expected 200"
>
> Do I need to start DataNode with s3?
> Datanode log says :
>
> 2012-08-02 17:50:20,021 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG:   host = datarpm-desktop/192.168.2.4
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 1.0.1
> STARTUP_MSG:   build > https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
> 1243785; compiled by 'hortonfo' on Tue Feb 14 08:15:38 UTC 2012
> ************************************************************/
> 2012-08-02 17:50:20,145 INFO org.apache.hadoop.metrics2.impl.MetricsConfig:
> loaded properties from hadoop-metrics2.properties
> 2012-08-02 17:50:20,156 INFO
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
> MetricsSystem,sub=Stats registered.
> 2012-08-02 17:50:20,157 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period
> at 10 second(s).
> 2012-08-02 17:50:20,157 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system
> started
> 2012-08-02 17:50:20,277 INFO
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi
> registered.
> 2012-08-02 17:50:20,281 WARN
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
> exists!
> 2012-08-02 17:50:20,317 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded
> the native-hadoop library
> 2012-08-02 17:50:22,006 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call
> to <bucket-name>/67.215.65.132:8020 failed on local exception:
> java.io.EOFException
>     at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1071)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>     at $Proxy5.getProtocolVersion(Unknown Source)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:370)
>     at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:429)
>     at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:331)
>     at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:296)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:356)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)

Harsh J