Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Identification of mapper slots


+
Hider, Sandy 2013-10-14, 21:49
Copy link to this message
-
Re: Identification of mapper slots
I assume you know the tradeoff here: If you do depend upon mapper slot # in
your implementation to speed it up, you are losing on code portability in
long term....

That said, one way to achieve this is to use the JobConf API:

int partition = jobConf.getInt(JobContext.TASK_PARTITION, -1);

The framework assigns unique partition # to each mapper; this allows  them
to write to a distinct output file. Note that this is a global partition #,
not local to each node.

Also, in case you have mappers and reducers using the same cache, then add
jobConf.getBoolean(JobContext.TASK_ISMAP)...  check to indicate whether you
are executing in mapper or reducer context.
-Rahul

On Mon, Oct 14, 2013 at 2:49 PM, Hider, Sandy <[EMAIL PROTECTED]>wrote:

> ** **
>
> In Hadoop under the mapred-site.conf  I can set the maximum number of
> mappers. For the sake of this email I will call the number of concurrent
> mappers: mapper slots.  ****
>
> ** **
>
> Is it possible to figure out from within the mapper which mapper slot it
> is running in? ****
>
> ** **
>
> On this project this is important because each mapper has to fork off a
> Matlab runtime compiled executable.  The executable is passed in at runtime
> a cache to work in.  Setting up the cache when given an new directory takes
> a long time but can be used again quickly on future calls if provided the
> same location of the cache.   As it turns out when multiple mappers try to
> use the same cache they crash the executable.   So ideally if I could
> identify which mapper slot a mapper is running in, I can setup caches for
> each slot and avoid the cache creation time and still guarantee that no two
> mappers write to the same cache.  ****
>
> ** **
>
> Thanks for taking the time to read this,****
>
> ** **
>
> Sandy****
>
> ** **
>
> ** **
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB