|
|
+
Parth Savani 2012-10-25, 19:52
+
Harsh J 2012-10-26, 03:09
+
Parth Savani 2012-10-23, 17:32
-
Re: File Permissions on s3 FileSystemMarcos Ortiz 2012-10-23, 18:09
El 23/10/12 13:32, Parth Savani escribi�:
> Hello Everyone, > I am trying to run a hadoop job with s3n as my filesystem. > I changed the following properties in my hdfs-site.xml > > fs.default.name <http://fs.default.name>=s3n://KEY:VALUE@bucket/ A good practice to this is to use these two properties in the core-site.xml, if you will use S3 often: <property> <name>fs.s3.awsAccessKeyId</name> <value>AWS_ACCESS_KEY_ID</value> </property> <property> <name>fs.s3.awsSecretAccessKey</name> <value>AWS_SECRET_ACCESS_KEY</value> </property> After that, you can access to your URI with a more friendly way: S3: s3://<s3-bucket>/<s3-filepath> S3n: s3n://<s3-bucket>/<s3-filepath> > mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp > > When i run the job from ec2, I get the following error > > The ownership on the staging directory > s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is > owned by The directory must be owned by the submitter ec2-user or by > ec2-user > at > org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:481) > > I am using cloudera CDH4 hadoop distribution. The error is thrown from > JobSubmissionFiles.java class > public static Path getStagingDir(JobClient client, Configuration conf) > throws IOException, InterruptedException { > Path stagingArea = client.getStagingAreaDir(); > FileSystem fs = stagingArea.getFileSystem(conf); > String realUser; > String currentUser; > UserGroupInformation ugi = UserGroupInformation.getLoginUser(); > realUser = ugi.getShortUserName(); > currentUser = > UserGroupInformation.getCurrentUser().getShortUserName(); > if (fs.exists(stagingArea)) { > FileStatus fsStatus = fs.getFileStatus(stagingArea); > String owner = fsStatus.*getOwner();* > if (!(owner.equals(currentUser) || owner.equals(realUser))) { > throw new IOException("*The ownership on the staging > directory " +* > * stagingArea + " is not as expected. " + * > * "It is owned by " + owner + ". The directory > must " +* > * "be owned by the submitter " + currentUser + " > or " +* > * "by " + realUser*); > } > if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) { > LOG.info("Permissions on staging directory " + stagingArea + " > are " + > "incorrect: " + fsStatus.getPermission() + ". Fixing > permissions " + > "to correct value " + JOB_DIR_PERMISSION); > fs.setPermission(stagingArea, JOB_DIR_PERMISSION); > } > } else { > fs.mkdirs(stagingArea, > new FsPermission(JOB_DIR_PERMISSION)); > } > return stagingArea; > } > > > I think my job calls getOwner() which returns NULL since s3 does not > have file permissions which results in the IO exception that i am > getting. Which what user are you launching the job in EC2? > > Any workaround for this? Any idea how i could you s3 as the filesystem > with hadoop on distributed mode? Look here: http://wiki.apache.org/hadoop/AmazonS3 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci |