Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig and DistributedCache


Copy link to this message
-
Re: Pig and DistributedCache
Eugene,
  As I said earlier, you can use a different dfs.umaskmode. Running pig
with -Ddfs.umaskmode=022 will give read access to all(755 instead of 700).
But all the files output, by the pig script will have those permission.

Better thing would be when you write the serialized file in the below step,
write it with more accessible permissions.
2. After that client side builds the filter, serialize it and move it to
server side.

Regards,
Rohini
On Tue, Feb 19, 2013 at 4:26 AM, Eugene Morozov
<[EMAIL PROTECTED]>wrote:

> Rohini,
>
> Sorry for misleading in previous e-mails with these users. Here is more
> robust explanation of my issue.
>
> This is what I've got when I've tried to run it.
>
> File has been successfully copied by using "tmpfiles".
> 2013-02-08 13:38:56,533 INFO
> org.apache.hadoop.hbase.filter.PrefixFuzzyRowFilterWithFile: File
>
> [/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/vagrant/.staging/job_201302081322_0001/files/pairs-tmp#pairs-tmp]
> has been found
> 2013-02-08 13:38:56,539 ERROR
> org.apache.hadoop.hbase.filter.PrefixFuzzyRowFilterWithFile: Cannot read
> file:
>
> [/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/vagrant/.staging/job_201302081322_0001/files/pairs-tmp#pairs-tmp]
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=hbase, access=EXECUTE,
>
> inode="/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/vagrant/.staging":vagrant:supergroup:drwx------
>
>
> org.apache.hadoop.hbase.filter.PrefixFuzzyRowFilterWithFile - it's my
> filter, it just lives in org.apache... package.
>
> 1. I have user vagrant and this user runs pig script.
> 2. After that client side builds the filter, serialize it and move it to
> server side.
> 3. RegionServer starts playing here: it deserializes the filter and tries
> to use it while reading table.
> 4. Filter in its turn tries to read the file, but since RegionServer has
> been started under system user called "hbase", the filter also has
> corresponding authentification and cannot access the file, which has been
> written with another user.
>
> Any ideas of what to try?
>
> On Sun, Feb 17, 2013 at 8:22 AM, Rohini Palaniswamy <
> [EMAIL PROTECTED]
> > wrote:
>
> > Hi Eugene,
> >       Sorry. Missed your reply earlier.
> >
> >     tmpfiles has been around for a while and will not be removed in
> hadoop
> > anytime soon. So don't worry about it. The hadoop configurations have
> never
> > been fully documented and people look at code and use them. They usually
> > deprecate for  years before removing it.
> >
> >   The file will be created with the permissions based on the
> dfs.umaskmode
> > setting (or fs.permissions.umask-mode in Hadoop 0.23/2.x) and the owner
> of
> > the file will be the user who runs the pig script. The map job will be
> > launched as the same user by the pig script. I don't understand what you
> > mean by user runs map task does not have permissions. What kind of hadoop
> > authentication are you are doing such that the file is created as one
> user
> > and map job is launched as another user?
> >
> > Regards,
> > Rohini
> >
> >
> > On Sun, Feb 10, 2013 at 10:26 PM, Eugene Morozov
> > <[EMAIL PROTECTED]>wrote:
> >
> > > Hi, again.
> > >
> > > I've been able to successfully use the trick with DistributedCache and
> > > "tmpfiles" - during run of my Pig script the files are copied by
> > JobClient
> > > to job-cache.
> > >
> > > But here is the issue. The files are there, but they have permission
> 700
> > > and user that runs maptask (I suppose it's hbase) doesn't have
> permission
> > > to read them. Permissions are belong to my current OS user.
> > >
> > > In first, It looks like a bug, doesn't it?
> > > In second, what can I do about it?
> > >
> > >
> > > On Thu, Feb 7, 2013 at 11:42 AM, Eugene Morozov
> > > <[EMAIL PROTECTED]>wrote:
> > >
> > > > Rohini,
> > > >
> > > > thank you for the reply.
> > > >
> > > > Isn't it kinda hack to use "tmpfiles"? It's neither API nor good
> known