Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> AccessControlException in estimateNumberOfReducers


Copy link to this message
-
Re: AccessControlException in estimateNumberOfReducers
You should be able to workaround this issue by explicitly setting the
number of reducer (parallel keyword in the statements or define
default_parallel).
This is an unusual use case, but i don't see any harm in doing what you
suggest. Please feel free to open a jira and submit a patch.

Thanks,
Thejas
On 11/21/11 7:45 PM, Adam Portley wrote:
> I'm running into an issue with pig 0.9.1. My top-level data directory
> contains several files and directories with restricted permissions, and
> my LoadFunc and input format ignore these directories if the user does
> not have permission to read them. Unfortunately pig's execution engine
> still throws an exception.
>
> Example:
>
> $ hadoop fs -ls /data
> Found 2 items
> drwxr-xr-x - owner users 0 2011-11-16 06:47 /data/readable
> drwxr-x--- - owner secure 0 2011-11-16 06:48 /data/secure
>
> The /data/secure directory is readable only by users in the 'secure'
> group. Non-secure users encounter the following pig exception even
> though the loader and input format do not touch secure data:
>
> REGISTER my-jar;
> data = LOAD /data USING myLoader();
> (do something..)
>
> Caused by: org.apache.hadoop.security.AccessControlException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=<removed>, access=READ_EXECUTE, inode="secure":owner:secure:rwxr-x---
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
>
> at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
>
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:669)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:280)
>
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getPathLength(JobControlCompiler.java:791)
>
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getPathLength(JobControlCompiler.java:794)
>
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:779)
>
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:739)
>
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:587)
>
> ... 12 more
>
>
> I think Pig should probably catch this exception and ignore unreadable
> directories when estimating the number of reducers.
>
> Thanks,
> --Adam
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB