Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # dev - Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)


+
Sushanth Sowmyan 2013-07-01, 18:26
+
Sushanth Sowmyan 2013-07-01, 22:13
+
Chaoyu Tang 2013-07-02, 16:37
+
Chaoyu Tang 2013-07-02, 16:39
+
Ashutosh Chauhan 2013-07-03, 01:46
Copy link to this message
-
Re: Review Request 12050: HIVE-3756 (LOAD DATA does not honor permission inheritance)
Chaoyu Tang 2013-07-17, 03:26


> On July 3, 2013, 1:46 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2128
> > <https://reviews.apache.org/r/12050/diff/2/?file=314926#file314926line2128>
> >
> >     I had quite a discussion on this with Rohini on HIVE-2936 on how to do these fs ops in a performant and robust way. Feel free to follow that.

This piece of code is rewritten to:
{code}
      try {
        //if destf is an existing directory, rename just move the src file/dir under destf and
        //destf itself will be the parent dir after the rename
        if (fs.getFileStatus(destf).isDir()) {
          destfp = destf;
          LOG.debug("Renaming:" + destf.toString()  + " is an existing directory; " +
          srcf.toString() + " will move under it.");
        }
      } catch (FileNotFoundException ignore) {
      }
{code}
So it will be optimized to save 1 call if the destf exists
> On July 3, 2013, 1:46 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2151
> > <https://reviews.apache.org/r/12050/diff/2/?file=314926#file314926line2151>
> >
> >     Why not use api instead of FsShell? Does filesystem doesn't offer an api for recursively doing chmod and chgrp?

FileSystem offers the APIs for changing permission and owner (group), but not recursively. An alternate implementation I thought of initially is like following:
{code}
      try {
        FileStatus destParentfstatus = fs.getFileStatus(destfp);
        String destfpGroup = destParentfstatus.getGroup();
        FsPermission destfpPermission = destParentfstatus.getPermission();
        FileStatus[] fstatusList = fs.listStatus(destf);
        for (FileStatus fstatus : fstatusList) {
          fs.setOwner(fstatus.getPath(), null, destfpGroup);
          fs.setPermission(fstatus.getPath(), destfpPermission);
        }
      } catch (FileNotFoundException fnfe) {
        //it should not be thrown
      } catch (IOException e) {
        throw new HiveException("Unable to set permissions of " + destf
            + " to those of its parent " + destf.getParent(), e);      
      }
{code}
I chose to use the FsShell since the code with commands (-chgrp, -chmod) looked to me more intuitive. As for which way is more performant, I am not sure but I was also not able to tell any gain in the alternate approach. Any suggestion?
> On July 3, 2013, 1:46 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, line 2265
> > <https://reviews.apache.org/r/12050/diff/2/?file=314926#file314926line2265>
> >
> >     I think following way to write this is more robust:
> >     if(inheritPerms) {
> >     fs.mkdirs(destfp, destfp.getParent().getPerms);
> >     } else {
> >     fs.mkdirs(destfp);
> >     }

It was following the Rohini's comments on HIVE-2936:
{code}
It is not safe to use fs.mkdirs(path, permission) because the dfs umask is applied on that permission in DFSClient which is not desired. We have been bitten by wrong permission issues because of using that API. It is always safer to do mkdirs() and then do a setPermission() if you are dealing with HDFS.
{code}
So I did:
{code}
         boolean success = fs.mkdirs(destfp);
         if (inheritPerms && success) {
           fs.setPermission(destfp, fs.getFileStatus(destfp.getParent()).getPermission());
         }
{code}

Not sure, since these APIs documentation is quite limited.
- Chaoyu
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12050/#review22698
-----------------------------------------------------------
On July 2, 2013, 4:39 p.m., Chaoyu Tang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12050/
> -----------------------------------------------------------
>
> (Updated July 2, 2013, 4:39 p.m.)
>
>
> Review request for hive.
+
Chaoyu Tang 2013-07-19, 18:54
+
Chaoyu Tang 2013-07-19, 18:55
+
Ashutosh Chauhan 2013-07-25, 18:22