Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Execution directory for child process within mapper


Copy link to this message
-
Re: Execution directory for child process within mapper
I had a similar issue - when I needed the same file for each reduce (or map
task) I simply added Java code to the setup method to write a file to ".".
When every map needed different files I wrote the files before calling the
executable. The trick also works when the code writes to a file rather than
stdout

On Mon, Sep 26, 2011 at 12:19 PM, Devaraj k <[EMAIL PROTECTED]> wrote:

> Localized distributed cache also can be helpful here, if you can do
> necessary changes to your code. It locates like this in local directory
> ${mapred.local.dir}/taskTracker/archive/.
>
> As per your explanation, I feel you can write the mapper in such way that
> copy the files from your customized location(
> /home/users/{user}/input/jobname) to the current working directory and then
> start executing the executable.
>
> I hope this helps. :)
>
>
> Thanks
> Devaraj
> ________________________________________
> From: Joris Poort [[EMAIL PROTECTED]]
> Sent: Tuesday, September 27, 2011 12:25 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Execution directory for child process within mapper
>
> Hi Devaraj,
>
> Thanks for your help - that makes sense.  Is there any way to copy the
> local files needed for execution to the mapred.local.dir?
> Unfortunately I'm running a local code which I cannot edit - and this
> code is the one which assumes these files are available in the same
> directory.
>
> Thanks!
>
> Joris
>
> On Mon, Sep 26, 2011 at 11:40 AM, Devaraj k <[EMAIL PROTECTED]> wrote:
> > Hi Joris,
> >
> > You cannot configure the work directory directly. You can configure the
> local directory with property 'mapred.local.dir', and it will be used
> further to create the work directory like
> '${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/work'. Based on
> this, you can relatively refer your local command to execute.
> >
> > I hope this page will help you to understand the directory structure
> clearly.
> http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Directory+Structure
> >
> >
> > Thanks
> > Devaraj
> > ________________________________________
> > From: Joris Poort [[EMAIL PROTECTED]]
> > Sent: Monday, September 26, 2011 11:20 PM
> > To: mapreduce-user
> > Subject: Execution directory for child process within mapper
> >
> > As part of my Java mapper I have a command executes some standalone
> > code on a local slave node. When I run a code it executes fine, unless
> > it is trying to access some local files in which case I get the error
> > that it cannot locate those files.
> >
> > Digging a little deeper it seems to be executing from the following
> directory:
> >
> >
>  /data/hadoop/mapred/local/taskTracker/{user}/jobcache/job_201109261253_0023/attempt_201109261253_0023_m_000001_0/work
> >
> > But I am intending to execute from a local directory where the
> > relevant files are located:
> >
> >    /home/users/{user}/input/jobname
> >
> > Is there a way in java/hadoop to force the execution from the local
> > directory, instead of the jobcache directory automatically created in
> > hadoop?
> >
> > Is there perhaps a better way to go about this?
> >
> > Any help on this would be greatly appreciated!
> >
> > Cheers,
> >
> > Joris
> >
>

--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB