Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> reading distributed cache returns null pointer


Copy link to this message
-
Re: reading distributed cache returns null pointer
The DistributedCache behavior is not symmetrical in local mode vs
distributed mode.

As I replied earlier, you need to use

DistributedCache.getCacheFiles() in distributed mode.

In your code, you can put  a check:

if (getLocalCacheFiles()) returns null then use getCacheFiles()) instead. Or
use the right API depending upon the mode you are executing in.

-Rahul

On Sat, Jul 10, 2010 at 3:18 PM, abc xyz <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Thanks. Ok
>
> Path[] ps=DistributedCache.getLocalCacheFiles(cnf);
>
>  retreives for me the correct path in pseudo-distributed mode. But when I
> run my
> program in fully-distributed mode with 5 nodes, I get a null pointer.
> Theorcatically, if it worked on pseudo-distributed mode, it should work on
> fully-distributed mode as well. What possibilities can be there for this
> behavior?
>
> Cheers
>
>
>
>
> ________________________________
> From: Hemanth Yamijala <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Fri, July 9, 2010 10:21:19 AM
> Subject: Re: reading distributed cache returns null pointer
>
> Hi,
>
> > Thanks for the information. I got your point. What I specifically want to
> ask
> >is
> > that if I use the following method to read my file now in each mapper:
> >
> >            FileSystem        hdfs=FileSystem.get(conf);
> >              URI[] uris=DistributedCache.getCacheFiles(conf);
> >              Path my_path=new Path(uris[0].getPath());
> >
> >             if(hdfs.exists(my_path))
> >            {
> >                 FSDataInputStream    fs=hdfs.open(my_path);
> >                 while((str=fs.readLine())!=null)
> >                       System.out.println(str);
> >            }
> > would this method retrieve the file from HDFS? since I am using the
> Hadoop
> API?
> > not the local file API.
> >
>
> It would be instructive to look at the test code in
> src/test/mapred/org/apache/hadoop/mapred/TestMRWithDistributedCache.java.
> This gives a fair idea of how to access the files of DistributedCache
> from within the mapper. Specifically see how the LocalFileSystem is
> used to access the files. You could look at the same class in the
> branch-20 source code if you are using an older version of Hadoop.
>
> >
> > I may be understanding somehting horribly wrong. The situation is that
> now
> > my_path contains DCache/Orders.txt and if i am reading from here, this is
> the
> > path of file on HDFS as well. How does it know to pick the file from the
> local
> > file system, not the HDFS?
> >
> > Thanks again
> >
> >
> >
> >
> > ________________________________
> > From: Rahul Jain <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Sent: Fri, July 9, 2010 12:19:44 AM
> > Subject: Re: reading distributed cache returns null pointer
> >
> > Yes, distributed cache writes files to the local file system for each
> mapper
> > / reducer. So you should be able to access the file(s) using local file
> > system APIs.
> >
> > If the files were staying in HDFS there would be no point to using
> > distributed cache since all mappers already have access to the global
> HDFS
> > directories :).
> >
> > -Rahul
> >
> > On Thu, Jul 8, 2010 at 3:03 PM, abc xyz <[EMAIL PROTECTED]> wrote:
> >
> >> Hi Rahul,
> >> Thanks. It worked. I was using getFileClassPaths() to get the paths to
> the
> >> files
> >> in the cache and then use this path to access the file. It should have
> >> worked
> >> but I don't know why that doesn't produce the required result.
> >>
> >> I added the file HDFS file DCache/Orders.txt to my distributed cache.
> After
> >> calling DistributedCache.getCacheFiles(conf); in the configure method of
> >> the
> >> mapper node, if I read the file now from the returned path (which
> happens
> >> to be
> >> DCache/Orders.txt) using the Hadoop API , would the file be read from
> the
> >> local
> >> directory of the mapper node? More specifically I am doing this:
> >>
> >>
> >>            FileSystem        hdfs=FileSystem.get(conf);
> >>             URI[] uris=DistributedCache.getCacheFiles(conf);
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB