-Re: Pig and DistributedCache
Eugene Morozov 2013-02-19, 12:26
Sorry for misleading in previous e-mails with these users. Here is more
robust explanation of my issue.
This is what I've got when I've tried to run it.
File has been successfully copied by using "tmpfiles".
2013-02-08 13:38:56,533 INFO
has been found
2013-02-08 13:38:56,539 ERROR
org.apache.hadoop.hbase.filter.PrefixFuzzyRowFilterWithFile: Cannot read
org.apache.hadoop.security.AccessControlException: Permission denied:
org.apache.hadoop.hbase.filter.PrefixFuzzyRowFilterWithFile - it's my
filter, it just lives in org.apache... package.
1. I have user vagrant and this user runs pig script.
2. After that client side builds the filter, serialize it and move it to
3. RegionServer starts playing here: it deserializes the filter and tries
to use it while reading table.
4. Filter in its turn tries to read the file, but since RegionServer has
been started under system user called "hbase", the filter also has
corresponding authentification and cannot access the file, which has been
written with another user.
Any ideas of what to try?
On Sun, Feb 17, 2013 at 8:22 AM, Rohini Palaniswamy <[EMAIL PROTECTED]
> Hi Eugene,
> Sorry. Missed your reply earlier.
> tmpfiles has been around for a while and will not be removed in hadoop
> anytime soon. So don't worry about it. The hadoop configurations have never
> been fully documented and people look at code and use them. They usually
> deprecate for years before removing it.
> The file will be created with the permissions based on the dfs.umaskmode
> setting (or fs.permissions.umask-mode in Hadoop 0.23/2.x) and the owner of
> the file will be the user who runs the pig script. The map job will be
> launched as the same user by the pig script. I don't understand what you
> mean by user runs map task does not have permissions. What kind of hadoop
> authentication are you are doing such that the file is created as one user
> and map job is launched as another user?
> On Sun, Feb 10, 2013 at 10:26 PM, Eugene Morozov
> <[EMAIL PROTECTED]>wrote:
> > Hi, again.
> > I've been able to successfully use the trick with DistributedCache and
> > "tmpfiles" - during run of my Pig script the files are copied by
> > to job-cache.
> > But here is the issue. The files are there, but they have permission 700
> > and user that runs maptask (I suppose it's hbase) doesn't have permission
> > to read them. Permissions are belong to my current OS user.
> > In first, It looks like a bug, doesn't it?
> > In second, what can I do about it?
> > On Thu, Feb 7, 2013 at 11:42 AM, Eugene Morozov
> > <[EMAIL PROTECTED]>wrote:
> > > Rohini,
> > >
> > > thank you for the reply.
> > >
> > > Isn't it kinda hack to use "tmpfiles"? It's neither API nor good known
> > > practice, it's internal details. How safe is it to use such a trick? I
> > mean
> > > after month or so we probably update our CDH4 to whatever is there.
> > > Will it still work? Will it be safe for the cluster or for my job? Who
> > > knows what will be implemented there?
> > >
> > > You see, I can understand the code, find such a solution, but I won't
> > > able keep all of them in mind to check when we update the cluster.
> > >
> > >
> > > On Thu, Feb 7, 2013 at 1:23 AM, Rohini Palaniswamy <
> > > [EMAIL PROTECTED]> wrote:
> > >
> > >> You should be fine using tmpfiles and that's the way to do it.
> > >>
> > >> Else you will have to copy the file to hdfs, and call the
> > >> DistributedCache.addFileToClassPath yourself (basically what tmpfiles
Developer Grid Dynamics