Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> problem configuring hadoop with s3 bucket


Copy link to this message
-
Re: problem configuring hadoop with s3 bucket
I think you have made confusion about the integration of hadoop and S3.
1) If you set "dfs.data.dir=s3://******", it means that you set S3 as the
DataNode local storage.
You are still using HDFS as the under storage layer.
As far as I know, it is not supported at present.
2)  The right way of using S3 integrated with Hadoop is to replace the HDFS
with S3 just like you have tried.
But I think you have missed some configuration parameters.
"  fs.default.name=s3://<<mybucket> " is the most important parameter when
you are using S3 to replace HDFS, but it is not enough. The detail
configuration can be obtained from here
http://wiki.apache.org/hadoop/AmazonS3

Yanbo

2012/7/23 Alok Kumar <[EMAIL PROTECTED]>

> Hello Group,
>
> I've hadoop setup locally running.
>
> Now I want to use Amazon s3://<mybucket> as my data store,
> so i changed like " dfs.data.dir=s3://<mybucket>/hadoop/ " in my
> hdfs-site.xml, Is it the correct way?
> I'm getting error :
>
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in
> dfs.data.dir: can not create directory: s3://<mybucket>/hadoop
> 2012-07-23 13:15:06,260 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: All directories in
> dfs.data.dir are invalid.
>
> and
> when i changed like " dfs.data.dir=s3://<mybucket>/ "
> I got error :
>  ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> java.lang.IllegalArgumentException: Wrong FS: s3://<mybucket>/, expected:
> file:///
>     at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:381)
>     at
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:55)
>     at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:393)
>     at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
>     at
> org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:146)
>     at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:162)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1574)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
>     at
> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)
>
> Also,
> When I'm changing fs.default.name=s3://<<mybucket> , Namenode is not
> coming up with error : ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode: java.net.BindException:
> (Any way I want to run namenode locally, so I reverted it back to
> hdfs://localhost:9000 )
>
> Your help is highly appreciated!
> Thanks
> --
> Alok Kumar
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB