Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Re: Problem using distributed cache


Copy link to this message
-
Re: Problem using distributed cache
Dhaval Shah 2012-12-07, 14:23
You will need to add the cache file to distributed cache before creating the Job object.. Give that a spin and see if that works
 
Regards,
Dhaval
________________________________
 From: Peter Cogan <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Friday, 7 December 2012 9:06 AM
Subject: Re: Problem using distributed cache
 

Hi,

any thoughts on this would be much appreciated

thanks
Peter

On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <[EMAIL PROTECTED]> wrote:

Hi,
>
>
>It's an instance created at the start of the program like this:
>
>
>public static void main(String[] args) throws Exception {
>Configuration conf = new Configuration();
>
>
>Job job = new Job(conf, "wordcount");
>
>
>
>
>DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
>
>
>
>
>
>On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>What is your conf object there? Is it job.getConfiguration() or an
>>independent instance?
>>
>>
>>On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <[EMAIL PROTECTED]> wrote:
>>> Hi ,
>>>
>>> I want to use the distributed cache to allow my mappers to access data. In
>>> main, I'm using the command
>>>
>>> DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"),
>>> conf);
>>>
>>> Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
>>>
>>> Then, my setup function looks like this:
>>>
>>> public void setup(Context context) throws IOException, InterruptedException{
>>>     Configuration conf = context.getConfiguration();
>>>     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
>>>     //etc
>>> }
>>>
>>> However, this localFiles array is always null.
>>>
>>> I was initially running on a single-host cluster for testing, but I read
>>> that this will prevent the distributed cache from working. I tried with a
>>> pseudo-distributed, but that didn't work either
>>>
>>> I'm using hadoop 1.0.3
>>>
>>> thanks Peter
>>>
>>>
>>
>>
>>
>>--
>>Harsh J
>>
>