Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Moving Files to Distributed Cache in MapReduce


Copy link to this message
-
RE: Moving Files to Distributed Cache in MapReduce

Yeah,

I'll write something up and post it on my web site. Definitely not InfoQ stuff, but a simple tip and tricks stuff.

-Mike
> Subject: Re: Moving Files to Distributed Cache in MapReduce
> From: [EMAIL PROTECTED]
> Date: Sun, 31 Jul 2011 19:21:14 -0700
> To: [EMAIL PROTECTED]
>
>
> We really need to build a working example to the wiki and add a link from the FAQ page.  Any volunteers?
>
> On Jul 29, 2011, at 7:49 PM, Michael Segel wrote:
>
> >
> > Here's the meat of my post earlier...
> > Sample code on putting a file on the cache:
> > DistributedCache.addCacheFile(new URI(path+"MyFileName",conf));
> >
> > Sample code in pulling data off the cache:
> >       private Path[] localFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration());
> >        boolean exitProcess = false;
> >       int i=0;
> >        while (!exit){
> >            fileName = localFiles[i].getName();
> >           if (fileName.equalsIgnoreCase("model.txt")){
> >                 // Build your input file reader on localFiles[i].toString()
> >                 exitProcess = true;
> >           }
> >            i++;
> >        }
> >
> >
> > Note that this is SAMPLE code. I didn't trap the exit condition if the file isn't there and you go beyond the size of the array localFiles[].
> > Also I set exit to false because its easier to read this as "Do this loop until the condition exitProcess is true".
> >
> > When you build your file reader you need the full path, not just the file name. The path will vary when the job runs.
> >
> > HTH
> >
> > -Mike
> >
> >
> >> From: [EMAIL PROTECTED]
> >> To: [EMAIL PROTECTED]
> >> Subject: RE: Moving Files to Distributed Cache in MapReduce
> >> Date: Fri, 29 Jul 2011 21:43:37 -0500
> >>
> >>
> >> I could have sworn that I gave an example earlier this week on how to push and pull stuff from distributed cache.
> >>
> >>
> >>> Date: Fri, 29 Jul 2011 14:51:26 -0700
> >>> Subject: Re: Moving Files to Distributed Cache in MapReduce
> >>> From: [EMAIL PROTECTED]
> >>> To: [EMAIL PROTECTED]
> >>>
> >>> jobConf is deprecated in 0.20.2 I believe; you're supposed to be using
> >>> Configuration for that
> >>>
> >>> On Fri, Jul 29, 2011 at 1:59 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote:
> >>>
> >>>> Is this what you are looking for?
> >>>>
> >>>> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
> >>>>
> >>>> search for jobConf
> >>>>
> >>>> On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen <[EMAIL PROTECTED]> wrote:
> >>>>> Thanks for the response! However, I'm having an issue with this line
> >>>>>
> >>>>> Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
> >>>>>
> >>>>> because conf has private access in org.apache.hadoop.configured
> >>>>>
> >>>>> On Fri, Jul 29, 2011 at 11:18 AM, Mapred Learn <[EMAIL PROTECTED]
> >>>>> wrote:
> >>>>>
> >>>>>> I hope my previous reply helps...
> >>>>>>
> >>>>>> On Fri, Jul 29, 2011 at 11:11 AM, Roger Chen <[EMAIL PROTECTED]>
> >>>> wrote:
> >>>>>>
> >>>>>>> After moving it to the distributed cache, how would I call it within
> >>>> my
> >>>>>>> MapReduce program?
> >>>>>>>
> >>>>>>> On Fri, Jul 29, 2011 at 11:09 AM, Mapred Learn <
> >>>> [EMAIL PROTECTED]
> >>>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Did you try using -files option in your hadoop jar command as:
> >>>>>>>>
> >>>>>>>> /usr/bin/hadoop jar <jar name> <main class name> -files  <absolute
> >>>> path
> >>>>>>> of
> >>>>>>>> file to be added to distributed cache> <input dir> <output dir>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Fri, Jul 29, 2011 at 11:05 AM, Roger Chen <[EMAIL PROTECTED]>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Slight modification: I now know how to add files to the
> >>>> distributed
> >>>>>>> file
> >>>>>>>>> cache, which can be done via this command placed in the main or
> >>>> run
> >>>>>>>> class:
> >>>>>>>>>
> >>>>>>>>>       DistributedCache.addCacheFile(new
> >>>>>>> URI("/user/hadoop/thefile.dat"),