Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: File Permissions on s3 FileSystem


Copy link to this message
-
Re: File Permissions on s3 FileSystem
Hey Parth,

I don't think its possible to run MR by basing the FS over S3
completely. You can use S3 for I/O for your files, but your
fs.default.name (or fs.defaultFS) must be either file:/// or hdfs://
filesystems. This way, your MR framework can run/distribute its files
well, and also still be able to process S3 URLs passed as input or
output locations.

On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani <[EMAIL PROTECTED]> wrote:
> Hello Everyone,
>         I am trying to run a hadoop job with s3n as my filesystem.
> I changed the following properties in my hdfs-site.xml
>
> fs.default.name=s3n://KEY:VALUE@bucket/
> mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
>
> When i run the job from ec2, I get the following error
>
> The ownership on the staging directory
> s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is owned
> by   The directory must be owned by the submitter ec2-user or by ec2-user
> at
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
>
> I am using cloudera CDH4 hadoop distribution. The error is thrown from
> JobSubmissionFiles.java class
>  public static Path getStagingDir(JobClient client, Configuration conf)
>   throws IOException, InterruptedException {
>     Path stagingArea = client.getStagingAreaDir();
>     FileSystem fs = stagingArea.getFileSystem(conf);
>     String realUser;
>     String currentUser;
>     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
>     realUser = ugi.getShortUserName();
>     currentUser = UserGroupInformation.getCurrentUser().getShortUserName();
>     if (fs.exists(stagingArea)) {
>       FileStatus fsStatus = fs.getFileStatus(stagingArea);
>       String owner = fsStatus.getOwner();
>       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>          throw new IOException("The ownership on the staging directory " +
>                       stagingArea + " is not as expected. " +
>                       "It is owned by " + owner + ". The directory must " +
>                       "be owned by the submitter " + currentUser + " or " +
>                       "by " + realUser);
>       }
>       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
>         LOG.info("Permissions on staging directory " + stagingArea + " are "
> +
>           "incorrect: " + fsStatus.getPermission() + ". Fixing permissions "
> +
>           "to correct value " + JOB_DIR_PERMISSION);
>         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
>       }
>     } else {
>       fs.mkdirs(stagingArea,
>           new FsPermission(JOB_DIR_PERMISSION));
>     }
>     return stagingArea;
>   }
>
>
>
> I think my job calls getOwner() which returns NULL since s3 does not have
> file permissions which results in the IO exception that i am getting.
>
> Any workaround for this? Any idea how i could you s3 as the filesystem with
> hadoop on distributed mode?

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB