|
|
-
Re: AccessControlException in estimateNumberOfReducersThejas Nair 2011-11-23, 01:33
You should be able to workaround this issue by explicitly setting the
number of reducer (parallel keyword in the statements or define default_parallel). This is an unusual use case, but i don't see any harm in doing what you suggest. Please feel free to open a jira and submit a patch. Thanks, Thejas On 11/21/11 7:45 PM, Adam Portley wrote: > I'm running into an issue with pig 0.9.1. My top-level data directory > contains several files and directories with restricted permissions, and > my LoadFunc and input format ignore these directories if the user does > not have permission to read them. Unfortunately pig's execution engine > still throws an exception. > > Example: > > $ hadoop fs -ls /data > Found 2 items > drwxr-xr-x - owner users 0 2011-11-16 06:47 /data/readable > drwxr-x--- - owner secure 0 2011-11-16 06:48 /data/secure > > The /data/secure directory is readable only by users in the 'secure' > group. Non-secure users encounter the following pig exception even > though the loader and input format do not touch secure data: > > REGISTER my-jar; > data = LOAD /data USING myLoader(); > (do something..) > > Caused by: org.apache.hadoop.security.AccessControlException: > org.apache.hadoop.security.AccessControlException: Permission denied: > user=<removed>, access=READ_EXECUTE, inode="secure":owner:secure:rwxr-x--- > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) > > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57) > > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:669) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:280) > > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getPathLength(JobControlCompiler.java:791) > > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getPathLength(JobControlCompiler.java:794) > > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:779) > > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:739) > > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:587) > > ... 12 more > > > I think Pig should probably catch this exception and ignore unreadable > directories when estimating the number of reducers. > > Thanks, > --Adam > > |