Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - File Permissions on s3 FileSystem


Copy link to this message
-
Re: File Permissions on s3 FileSystem
Harsh J 2012-10-26, 03:09
Parth,

I think your problems are easier to solve if you run a 1-node HDFS as
the stage area for MR (i.e. JT FS is HDFS), and just do the I/O of
actual data over S3 (i.e. input and output paths for jobs are s3 or
s3n prefixed).

The JIRA you mention does have interesting workarounds, but the mere
fact that S3 doesn't support permission models, may break other places
in MR where we do permission logic for security reasons. You could get
away with doing one of the mentioned source hacks, but that won't be
necessarily guaranteeing you'll solve all problems, cause we don't
test MR running atop S3, though I think we do test S3 as a general FS
for I/O.

On Fri, Oct 26, 2012 at 1:22 AM, Parth Savani <[EMAIL PROTECTED]> wrote:
> Hello Harsh,
>          I am following steps based on this link:
> http://wiki.apache.org/hadoop/AmazonS3
>
> When i run the job, I am seeing that the hadoop places all the jars required
> for the job on s3. However, when it tries to run the job, it complains
> The ownership on the staging directory
> s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is owned
> by   The directory must be owned by the submitter ec2-user or by ec2-user
>
> Some people have seemed to solved this problem of permissions here ->
> https://issues.apache.org/jira/browse/HDFS-1333
> But they have made changes to some hadoop java classes and I wonder if
> there's an easy workaround.
>
>
> On Wed, Oct 24, 2012 at 12:21 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>> Hey Parth,
>>
>> I don't think its possible to run MR by basing the FS over S3
>> completely. You can use S3 for I/O for your files, but your
>> fs.default.name (or fs.defaultFS) must be either file:/// or hdfs://
>> filesystems. This way, your MR framework can run/distribute its files
>> well, and also still be able to process S3 URLs passed as input or
>> output locations.
>>
>> On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani <[EMAIL PROTECTED]>
>> wrote:
>> > Hello Everyone,
>> >         I am trying to run a hadoop job with s3n as my filesystem.
>> > I changed the following properties in my hdfs-site.xml
>> >
>> > fs.default.name=s3n://KEY:VALUE@bucket/
>> > mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp
>> >
>> > When i run the job from ec2, I get the following error
>> >
>> > The ownership on the staging directory
>> > s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is
>> > owned
>> > by   The directory must be owned by the submitter ec2-user or by
>> > ec2-user
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
>> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
>> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>> > at
>> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
>> >
>> > I am using cloudera CDH4 hadoop distribution. The error is thrown from
>> > JobSubmissionFiles.java class
>> >  public static Path getStagingDir(JobClient client, Configuration conf)
>> >   throws IOException, InterruptedException {
>> >     Path stagingArea = client.getStagingAreaDir();
>> >     FileSystem fs = stagingArea.getFileSystem(conf);
>> >     String realUser;
>> >     String currentUser;
>> >     UserGroupInformation ugi = UserGroupInformation.getLoginUser();
>> >     realUser = ugi.getShortUserName();
>> >     currentUser >> > UserGroupInformation.getCurrentUser().getShortUserName();
>> >     if (fs.exists(stagingArea)) {
>> >       FileStatus fsStatus = fs.getFileStatus(stagingArea);
>> >       String owner = fsStatus.getOwner();
>> >       if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>> >          throw new IOException("The ownership on the staging directory "

Harsh J