Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> How to Influence Reduce Task Location.


Copy link to this message
-
Re: How to Influence Reduce Task Location.
Not a very good approach for numerous reasons.  (e.g., you generally
don't want to run another processing-intensive app - like a database -
on a hadoop node, you don't want to have to worry about the exact
situation you're worrying about (i.e., trying to make certain output get
routed to certain specific nodes, etc.)

Run your database on a different machine from your hadoop nodes.  Run
your job on Hadoop and write the data to HDFS.  Then run a separate
process that imports the data into the database from HDFS.

DR

On 12/19/2010 01:23 PM, Jane Chen wrote:
> Suppose that the output is written to a database, that only runs on
> certain nodes.  It will be desirable to schedule the reducer tasks to
> run on the nodes local or close to the database nodes.
>
> Thanks, Jane
>
> --- On Sat, 12/18/10, Hari Sreekumar<[EMAIL PROTECTED]>
> wrote:
>
>
> From: Hari Sreekumar<[EMAIL PROTECTED]> Subject: Re: How to
> Influence Reduce Task Location. To: [EMAIL PROTECTED]
> Date: Saturday, December 18, 2010, 10:35 AM
>
>
> You can specify that a group of keys should go to the same host for
> reducing, but I have never encountered any situation where you need
> to know beforehand exactly which host a particular key should go to.
> I am not sure if that can be done. Just out of curiosity, why do you
> need this kind of control over reduction?
>
>
> Hari
>
>
> On Sat, Dec 18, 2010 at 11:54 PM, Jane Chen<[EMAIL PROTECTED]>
> wrote:
>
>
>
>
>
>
> But how does this help me request which host to schedule the reduce
> task to?
>
> Thanks, Jane
>
> --- On Sat, 12/18/10, Hari Sreekumar<[EMAIL PROTECTED]>
> wrote:
>
>
> From: Hari Sreekumar<[EMAIL PROTECTED]> Subject: Re: How to
> Influence Reduce Task Location. To: [EMAIL PROTECTED]
> Date: Saturday, December 18, 2010, 10:16 AM
>
>
>
>
>
> Hi Jane,
>
>
> The partitioner class can be used to achieve this.
> (http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/Partitioner.html).
>
>
>
> Thanks, Hari
>
>
> On Sat, Dec 18, 2010 at 11:13 PM, Jane Chen<[EMAIL PROTECTED]>
> wrote:
>
> Hi All,
>
> Is there anyway to influence where a reduce task is run?  We have a
> case where we'd like to choose the host to run the reduce task based
> on the task's input key.
>
> Any suggestion is greatly appreciated.
>
> Thanks, Jane
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB