Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: File Permissions on s3 FileSystem


+
Parth Savani 2012-10-25, 19:52
+
Harsh J 2012-10-26, 03:09
+
Parth Savani 2012-10-23, 17:32
Copy link to this message
-
Re: File Permissions on s3 FileSystem
El 23/10/12 13:32, Parth Savani escribi�:
> Hello Everyone,
>         I am trying to run a hadoop job with s3n as my filesystem.
> I changed the following properties in my hdfs-site.xml
>
> fs.default.name <http://fs.default.name>=s3n://KEY:VALUE@bucket/
A good practice to this is to use these two properties in the
core-site.xml, if you will use S3 often:
<property>
     <name>fs.s3.awsAccessKeyId</name>
     <value>AWS_ACCESS_KEY_ID</value>
</property>

<property>
     <name>fs.s3.awsSecretAccessKey</name>
     <value>AWS_SECRET_ACCESS_KEY</value>
</property>

After that, you can access to your URI with a more friendly way:
S3:
  s3://<s3-bucket>/<s3-filepath>

S3n:
  s3n://<s3-bucket>/<s3-filepath>

> mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
>
> When i run the job from ec2, I get the following error
>
> The ownership on the staging directory
> s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is
> owned by   The directory must be owned by the submitter ec2-user or by
> ec2-user
> at
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
>
> I am using cloudera CDH4 hadoop distribution. The error is thrown from
> JobSubmissionFiles.java class
>  public static Path getStagingDir(JobClient client, Configuration conf)
>   throws IOException, InterruptedException {
>     Path stagingArea = client.getStagingAreaDir();
>     FileSystem fs = stagingArea.getFileSystem(conf);
>     String realUser;
>     String currentUser;
>     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
>     realUser = ugi.getShortUserName();
>     currentUser =
> UserGroupInformation.getCurrentUser().getShortUserName();
>     if (fs.exists(stagingArea)) {
>       FileStatus fsStatus = fs.getFileStatus(stagingArea);
>       String owner = fsStatus.*getOwner();*
>       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>          throw new IOException("*The ownership on the staging
> directory " +*
> *                      stagingArea + " is not as expected. " + *
> *                      "It is owned by " + owner + ". The directory
> must " +*
> *                      "be owned by the submitter " + currentUser + "
> or " +*
> *                      "by " + realUser*);
>       }
>       if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
>         LOG.info("Permissions on staging directory " + stagingArea + "
> are " +
>           "incorrect: " + fsStatus.getPermission() + ". Fixing
> permissions " +
>           "to correct value " + JOB_DIR_PERMISSION);
>         fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
>       }
>     } else {
>       fs.mkdirs(stagingArea,
>           new FsPermission(JOB_DIR_PERMISSION));
>     }
>     return stagingArea;
>   }
>
>
> I think my job calls getOwner() which returns NULL since s3 does not
> have file permissions which results in the IO exception that i am
> getting.
Which what user are you launching the job in EC2?
>
> Any workaround for this? Any idea how i could you s3 as the filesystem
> with hadoop on distributed mode?

Look here:
http://wiki.apache.org/hadoop/AmazonS3

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB