Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: Problem using distributed cache


Copy link to this message
-
Re: Problem using distributed cache
Hi Dhaval & Harsh,

thanks for coming back to the thread - you're both right I was doing things
in the wrong order. I hadn't realised that Job constructor clones the
configuration - that's very interesting!

thanks again
Peter
On Fri, Dec 7, 2012 at 2:25 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Please try using job.getConfiguration() instead of the pre-job conf
> instance, cause the constructor clones it.
>
> On Fri, Dec 7, 2012 at 7:36 PM, Peter Cogan <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > any thoughts on this would be much appreciated
> >
> > thanks
> > Peter
> >
> >
> > On Thu, Dec 6, 2012 at 9:29 PM, Peter Cogan <[EMAIL PROTECTED]>
> wrote:
> >>
> >> Hi,
> >>
> >> It's an instance created at the start of the program like this:
> >>
> >> public static void main(String[] args) throws Exception {
> >>
> >> Configuration conf = new Configuration();
> >>
> >>
> >> Job job = new Job(conf, "wordcount");
> >>
> >>
> >>
> >> DistributedCache.addCacheFile(new
> URI("/user/peter/cacheFile/testCache1"),
> >> conf);
> >>
> >>
> >>
> >>
> >> On Thu, Dec 6, 2012 at 5:02 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> >>>
> >>> What is your conf object there? Is it job.getConfiguration() or an
> >>> independent instance?
> >>>
> >>> On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan <[EMAIL PROTECTED]>
> >>> wrote:
> >>> > Hi ,
> >>> >
> >>> > I want to use the distributed cache to allow my mappers to access
> data.
> >>> > In
> >>> > main, I'm using the command
> >>> >
> >>> > DistributedCache.addCacheFile(new
> >>> > URI("/user/peter/cacheFile/testCache1"),
> >>> > conf);
> >>> >
> >>> > Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
> >>> >
> >>> > Then, my setup function looks like this:
> >>> >
> >>> > public void setup(Context context) throws IOException,
> >>> > InterruptedException{
> >>> >     Configuration conf = context.getConfiguration();
> >>> >     Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
> >>> >     //etc
> >>> > }
> >>> >
> >>> > However, this localFiles array is always null.
> >>> >
> >>> > I was initially running on a single-host cluster for testing, but I
> >>> > read
> >>> > that this will prevent the distributed cache from working. I tried
> with
> >>> > a
> >>> > pseudo-distributed, but that didn't work either
> >>> >
> >>> > I'm using hadoop 1.0.3
> >>> >
> >>> > thanks Peter
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>
> >>
> >
>
>
>
> --
> Harsh J
>