Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> reading distributed cache returns null pointer


Copy link to this message
-
Re: reading distributed cache returns null pointer
Hi,

> Thanks for the information. I got your point. What I specifically want to ask is
> that if I use the following method to read my file now in each mapper:
>
>            FileSystem        hdfs=FileSystem.get(conf);
>              URI[] uris=DistributedCache.getCacheFiles(conf);
>              Path my_path=new Path(uris[0].getPath());
>
>             if(hdfs.exists(my_path))
>            {
>                 FSDataInputStream    fs=hdfs.open(my_path);
>                 while((str=fs.readLine())!=null)
>                       System.out.println(str);
>            }
> would this method retrieve the file from HDFS? since I am using the Hadoop API?
> not the local file API.
>

It would be instructive to look at the test code in
src/test/mapred/org/apache/hadoop/mapred/TestMRWithDistributedCache.java.
This gives a fair idea of how to access the files of DistributedCache
from within the mapper. Specifically see how the LocalFileSystem is
used to access the files. You could look at the same class in the
branch-20 source code if you are using an older version of Hadoop.

>
> I may be understanding somehting horribly wrong. The situation is that now
> my_path contains DCache/Orders.txt and if i am reading from here, this is the
> path of file on HDFS as well. How does it know to pick the file from the local
> file system, not the HDFS?
>
> Thanks again
>
>
>
>
> ________________________________
> From: Rahul Jain <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Fri, July 9, 2010 12:19:44 AM
> Subject: Re: reading distributed cache returns null pointer
>
> Yes, distributed cache writes files to the local file system for each mapper
> / reducer. So you should be able to access the file(s) using local file
> system APIs.
>
> If the files were staying in HDFS there would be no point to using
> distributed cache since all mappers already have access to the global HDFS
> directories :).
>
> -Rahul
>
> On Thu, Jul 8, 2010 at 3:03 PM, abc xyz <[EMAIL PROTECTED]> wrote:
>
>> Hi Rahul,
>> Thanks. It worked. I was using getFileClassPaths() to get the paths to the
>> files
>> in the cache and then use this path to access the file. It should have
>> worked
>> but I don't know why that doesn't produce the required result.
>>
>> I added the file HDFS file DCache/Orders.txt to my distributed cache. After
>> calling DistributedCache.getCacheFiles(conf); in the configure method of
>> the
>> mapper node, if I read the file now from the returned path (which happens
>> to be
>> DCache/Orders.txt) using the Hadoop API , would the file be read from the
>> local
>> directory of the mapper node? More specifically I am doing this:
>>
>>
>>            FileSystem        hdfs=FileSystem.get(conf);
>>             URI[] uris=DistributedCache.getCacheFiles(conf);
>>             Path my_path=new Path(uris[0].getPath());
>>
>>            if(hdfs.exists(my_path))
>>            {
>>                FSDataInputStream    fs=hdfs.open(my_path);
>>                while((str=fs.readLine())!=null)
>>                      System.out.println(str);
>>             }
>>
>> Thanks
>>
>>
>> ________________________________
>> From: Rahul Jain <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Sent: Thu, July 8, 2010 8:15:58 PM
>> Subject: Re: reading distributed cache returns null pointer
>>
>> I am not sure why you are using getFileClassPaths() API to access files...
>> here is what works for us:
>>
>> Add the file(s) to distributed cache using:
>> DistributedCache.addCacheFile(p.toUri(), conf);
>>
>> Read the files on the mapper using:
>>
>> URI[] uris = DistributedCache.getCacheFiles(conf);
>> // access one of the files:
>> paths[0] = new Path(uris[0].getPath());
>> // now follow hadoop or local file APIs to access the file...
>>
>>
>> Did you try the above and did it not work ?
>>
>> -Rahul
>>
>> On Thu, Jul 8, 2010 at 12:04 PM, abc xyz <[EMAIL PROTECTED]> wrote:
>>
>> > Hello all,
>> >
>> > As a new user of hadoop, I am having some problems with understanding
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB