Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> reading distributed cache returns null pointer


Copy link to this message
-
Re: reading distributed cache returns null pointer
The DistributedCache behavior is not symmetrical in local mode vs
distributed mode.

As I replied earlier, you need to use

DistributedCache.getCacheFiles() in distributed mode.

In your code, you can put  a check:

if (getLocalCacheFiles()) returns null then use getCacheFiles()) instead. Or
use the right API depending upon the mode you are executing in.

-Rahul

On Sat, Jul 10, 2010 at 3:18 PM, abc xyz <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Thanks. Ok
>
> Path[] ps=DistributedCache.getLocalCacheFiles(cnf);
>
>  retreives for me the correct path in pseudo-distributed mode. But when I
> run my
> program in fully-distributed mode with 5 nodes, I get a null pointer.
> Theorcatically, if it worked on pseudo-distributed mode, it should work on
> fully-distributed mode as well. What possibilities can be there for this
> behavior?
>
> Cheers
>
>
>
>
> ________________________________
> From: Hemanth Yamijala <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Fri, July 9, 2010 10:21:19 AM
> Subject: Re: reading distributed cache returns null pointer
>
> Hi,
>
> > Thanks for the information. I got your point. What I specifically want to
> ask
> >is
> > that if I use the following method to read my file now in each mapper:
> >
> >            FileSystem        hdfs=FileSystem.get(conf);
> >              URI[] uris=DistributedCache.getCacheFiles(conf);
> >              Path my_path=new Path(uris[0].getPath());
> >
> >             if(hdfs.exists(my_path))
> >            {
> >                 FSDataInputStream    fs=hdfs.open(my_path);
> >                 while((str=fs.readLine())!=null)
> >                       System.out.println(str);
> >            }
> > would this method retrieve the file from HDFS? since I am using the
> Hadoop
> API?
> > not the local file API.
> >
>
> It would be instructive to look at the test code in
> src/test/mapred/org/apache/hadoop/mapred/TestMRWithDistributedCache.java.
> This gives a fair idea of how to access the files of DistributedCache
> from within the mapper. Specifically see how the LocalFileSystem is
> used to access the files. You could look at the same class in the
> branch-20 source code if you are using an older version of Hadoop.
>
> >
> > I may be understanding somehting horribly wrong. The situation is that
> now
> > my_path contains DCache/Orders.txt and if i am reading from here, this is
> the
> > path of file on HDFS as well. How does it know to pick the file from the
> local
> > file system, not the HDFS?
> >
> > Thanks again
> >
> >
> >
> >
> > ________________________________
> > From: Rahul Jain <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Sent: Fri, July 9, 2010 12:19:44 AM
> > Subject: Re: reading distributed cache returns null pointer
> >
> > Yes, distributed cache writes files to the local file system for each
> mapper
> > / reducer. So you should be able to access the file(s) using local file
> > system APIs.
> >
> > If the files were staying in HDFS there would be no point to using
> > distributed cache since all mappers already have access to the global
> HDFS
> > directories :).
> >
> > -Rahul
> >
> > On Thu, Jul 8, 2010 at 3:03 PM, abc xyz <[EMAIL PROTECTED]> wrote:
> >
> >> Hi Rahul,
> >> Thanks. It worked. I was using getFileClassPaths() to get the paths to
> the
> >> files
> >> in the cache and then use this path to access the file. It should have
> >> worked
> >> but I don't know why that doesn't produce the required result.
> >>
> >> I added the file HDFS file DCache/Orders.txt to my distributed cache.
> After
> >> calling DistributedCache.getCacheFiles(conf); in the configure method of
> >> the
> >> mapper node, if I read the file now from the returned path (which
> happens
> >> to be
> >> DCache/Orders.txt) using the Hadoop API , would the file be read from
> the
> >> local
> >> directory of the mapper node? More specifically I am doing this:
> >>
> >>
> >>            FileSystem        hdfs=FileSystem.get(conf);
> >>             URI[] uris=DistributedCache.getCacheFiles(conf);