|
|
-
How to Influence Reduce Task Location.
Jane Chen 2010-12-18, 17:43
Hi All,
Is there anyway to influence where a reduce task is run? We have a case where we'd like to choose the host to run the reduce task based on the task's input key.
Any suggestion is greatly appreciated.
Thanks, Jane
-
Re: How to Influence Reduce Task Location.
Hari Sreekumar 2010-12-18, 18:16
Hi Jane, The partitioner class can be used to achieve this. ( http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/Partitioner.html). Thanks, Hari On Sat, Dec 18, 2010 at 11:13 PM, Jane Chen <[EMAIL PROTECTED]>wrote: > Hi All, > > Is there anyway to influence where a reduce task is run? We have a case > where we'd like to choose the host to run the reduce task based on the > task's input key. > > Any suggestion is greatly appreciated. > > Thanks, > Jane > > > >
-
Re: How to Influence Reduce Task Location.
Jane Chen 2010-12-18, 18:24
But how does this help me request which host to schedule the reduce task to? Thanks, Jane --- On Sat, 12/18/10, Hari Sreekumar <[EMAIL PROTECTED]> wrote: From: Hari Sreekumar <[EMAIL PROTECTED]> Subject: Re: How to Influence Reduce Task Location. To: [EMAIL PROTECTED] Date: Saturday, December 18, 2010, 10:16 AM Hi Jane, The partitioner class can be used to achieve this. ( http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/Partitioner.html). Thanks, Hari On Sat, Dec 18, 2010 at 11:13 PM, Jane Chen <[EMAIL PROTECTED]> wrote: Hi All, Is there anyway to influence where a reduce task is run? We have a case where we'd like to choose the host to run the reduce task based on the task's input key. Any suggestion is greatly appreciated. Thanks, Jane
-
Re: How to Influence Reduce Task Location.
Hari Sreekumar 2010-12-18, 18:35
You can specify that a group of keys should go to the same host for reducing, but I have never encountered any situation where you need to know beforehand exactly which host a particular key should go to. I am not sure if that can be done. Just out of curiosity, why do you need this kind of control over reduction? Hari On Sat, Dec 18, 2010 at 11:54 PM, Jane Chen <[EMAIL PROTECTED]>wrote: > But how does this help me request which *host* to schedule the reduce task > to? > Thanks, > Jane > > --- On *Sat, 12/18/10, Hari Sreekumar <[EMAIL PROTECTED]>* wrote: > > > From: Hari Sreekumar <[EMAIL PROTECTED]> > Subject: Re: How to Influence Reduce Task Location. > To: [EMAIL PROTECTED] > Date: Saturday, December 18, 2010, 10:16 AM > > > Hi Jane, > > The partitioner class can be used to achieve this. ( > http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/Partitioner.html> ). > > Thanks, > Hari > > On Sat, Dec 18, 2010 at 11:13 PM, Jane Chen <[EMAIL PROTECTED]< http://us.mc509.mail.yahoo.com/mc/compose?[EMAIL PROTECTED]> > > wrote: > > Hi All, > > Is there anyway to influence where a reduce task is run? We have a case > where we'd like to choose the host to run the reduce task based on the > task's input key. > > Any suggestion is greatly appreciated. > > Thanks, > Jane > > > > > >
-
Re: How to Influence Reduce Task Location.
Jane Chen 2010-12-19, 18:23
Suppose that the output is written to a database, that only runs on certain nodes. It will be desirable to schedule the reducer tasks to run on the nodes local or close to the database nodes. Thanks, Jane --- On Sat, 12/18/10, Hari Sreekumar <[EMAIL PROTECTED]> wrote: From: Hari Sreekumar <[EMAIL PROTECTED]> Subject: Re: How to Influence Reduce Task Location. To: [EMAIL PROTECTED] Date: Saturday, December 18, 2010, 10:35 AM You can specify that a group of keys should go to the same host for reducing, but I have never encountered any situation where you need to know beforehand exactly which host a particular key should go to. I am not sure if that can be done. Just out of curiosity, why do you need this kind of control over reduction? Hari On Sat, Dec 18, 2010 at 11:54 PM, Jane Chen <[EMAIL PROTECTED]> wrote: But how does this help me request which host to schedule the reduce task to? Thanks, Jane --- On Sat, 12/18/10, Hari Sreekumar <[EMAIL PROTECTED]> wrote: From: Hari Sreekumar <[EMAIL PROTECTED]> Subject: Re: How to Influence Reduce Task Location. To: [EMAIL PROTECTED] Date: Saturday, December 18, 2010, 10:16 AM Hi Jane, The partitioner class can be used to achieve this. ( http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/Partitioner.html). Thanks, Hari On Sat, Dec 18, 2010 at 11:13 PM, Jane Chen <[EMAIL PROTECTED]> wrote: Hi All, Is there anyway to influence where a reduce task is run? We have a case where we'd like to choose the host to run the reduce task based on the task's input key. Any suggestion is greatly appreciated. Thanks, Jane
-
Re: How to Influence Reduce Task Location.
Eric 2010-12-19, 18:31
I can't answer your question, but have you looked at HadoopDB? Maybe it fits your needs.
Op 19-12-10 19:23, Jane Chen schreef: > Suppose that the output is written to a database, that only runs on > certain nodes. It will be desirable to schedule the reducer tasks to > run on the nodes local or close to the database nodes. > Thanks, > Jane >
-
Re: How to Influence Reduce Task Location.
Allen Wittenauer 2010-12-19, 23:59
On Dec 19, 2010, at 10:23 AM, Jane Chen wrote:
> Suppose that the output is written to a database, that only runs on certain nodes. It will be desirable to schedule the reducer tasks to run on the nodes local or close to the database nodes. a) That's a side-effect--pretty much "against the rules". Very little support is provided for such things.
b) At a minimum, you'll need to write your own scheduler.
-
Re: How to Influence Reduce Task Location.
David Rosenstrauch 2010-12-20, 03:26
On 12/18/2010 12:43 PM, Jane Chen wrote: > Hi All, > > Is there anyway to influence where a reduce task is run? We have a case where we'd like to choose the host to run the reduce task based on the task's input key. > > Any suggestion is greatly appreciated. > > Thanks, > Jane
We don't do exactly that, but we do something similar.
We don't make specific reducers run on specific hosts. But we do specifically shard our data - e.g., into 1024 shards - and we then run 1024 reducers, each of which runs on its correspondingly numbered shard of the data.
DR
-
Re: How to Influence Reduce Task Location.
David Rosenstrauch 2010-12-20, 03:28
And, as a follow-up, yes, we use the partitioner class to achieve this. Our partioner runs a hashing algorithm which ensures that a given user key will always map to a specific shard #. DR On 12/18/2010 01:16 PM, Hari Sreekumar wrote: > Hi Jane, > > The partitioner class can be used to achieve this. ( > http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/Partitioner.html> ). > > Thanks, > Hari > > On Sat, Dec 18, 2010 at 11:13 PM, Jane Chen<[EMAIL PROTECTED]>wrote: > >> Hi All, >> >> Is there anyway to influence where a reduce task is run? We have a case >> where we'd like to choose the host to run the reduce task based on the >> task's input key. >> >> Any suggestion is greatly appreciated. >> >> Thanks, >> Jane
-
Re: How to Influence Reduce Task Location.
David Rosenstrauch 2010-12-20, 03:30
It doesn't. But you really can't do what you're asking. Nor, I think, would you really want to. The whole idea behind Hadoop is that it's a distributed system whereby nodes are pretty much interchangeable. There's nothing to be gained by trying to pin a particular reduce task to a particular node - and much to be lost: e.g., redundancy, speculative execution, etc. DR On 12/18/2010 01:24 PM, Jane Chen wrote: > But how does this help me request which host to schedule the reduce task to? > > Thanks, > Jane > > --- On Sat, 12/18/10, Hari Sreekumar<[EMAIL PROTECTED]> wrote: > > > From: Hari Sreekumar<[EMAIL PROTECTED]> > Subject: Re: How to Influence Reduce Task Location. > To: [EMAIL PROTECTED] > Date: Saturday, December 18, 2010, 10:16 AM > > > Hi Jane, > > > The partitioner class can be used to achieve this. ( http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/Partitioner.html). > > > Thanks, > Hari > > > On Sat, Dec 18, 2010 at 11:13 PM, Jane Chen<[EMAIL PROTECTED]> wrote: > > Hi All, > > Is there anyway to influence where a reduce task is run? We have a case where we'd like to choose the host to run the reduce task based on the task's input key. > > Any suggestion is greatly appreciated. > > Thanks, > Jane
-
Re: How to Influence Reduce Task Location.
David Rosenstrauch 2010-12-20, 03:36
Not a very good approach for numerous reasons. (e.g., you generally don't want to run another processing-intensive app - like a database - on a hadoop node, you don't want to have to worry about the exact situation you're worrying about (i.e., trying to make certain output get routed to certain specific nodes, etc.) Run your database on a different machine from your hadoop nodes. Run your job on Hadoop and write the data to HDFS. Then run a separate process that imports the data into the database from HDFS. DR On 12/19/2010 01:23 PM, Jane Chen wrote: > Suppose that the output is written to a database, that only runs on > certain nodes. It will be desirable to schedule the reducer tasks to > run on the nodes local or close to the database nodes. > > Thanks, Jane > > --- On Sat, 12/18/10, Hari Sreekumar<[EMAIL PROTECTED]> > wrote: > > > From: Hari Sreekumar<[EMAIL PROTECTED]> Subject: Re: How to > Influence Reduce Task Location. To: [EMAIL PROTECTED] > Date: Saturday, December 18, 2010, 10:35 AM > > > You can specify that a group of keys should go to the same host for > reducing, but I have never encountered any situation where you need > to know beforehand exactly which host a particular key should go to. > I am not sure if that can be done. Just out of curiosity, why do you > need this kind of control over reduction? > > > Hari > > > On Sat, Dec 18, 2010 at 11:54 PM, Jane Chen<[EMAIL PROTECTED]> > wrote: > > > > > > > But how does this help me request which host to schedule the reduce > task to? > > Thanks, Jane > > --- On Sat, 12/18/10, Hari Sreekumar<[EMAIL PROTECTED]> > wrote: > > > From: Hari Sreekumar<[EMAIL PROTECTED]> Subject: Re: How to > Influence Reduce Task Location. To: [EMAIL PROTECTED] > Date: Saturday, December 18, 2010, 10:16 AM > > > > > > Hi Jane, > > > The partitioner class can be used to achieve this. > ( http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/Partitioner.html). > > > > Thanks, Hari > > > On Sat, Dec 18, 2010 at 11:13 PM, Jane Chen<[EMAIL PROTECTED]> > wrote: > > Hi All, > > Is there anyway to influence where a reduce task is run? We have a > case where we'd like to choose the host to run the reduce task based > on the task's input key. > > Any suggestion is greatly appreciated. > > Thanks, Jane
-
Re: How to Influence Reduce Task Location.
Arun C Murthy 2011-01-09, 09:11
You can't do this currently at all. At a minimum you need a custom scheduler and you need to do more work on the JT for this to happen. I still haven't seen a good enough reason for this 'feature' in nearly 5 years. Arun On Dec 18, 2010, at 10:24 AM, Jane Chen wrote: > But how does this help me request which host to schedule the reduce > task to? > Thanks, > Jane > > --- On Sat, 12/18/10, Hari Sreekumar <[EMAIL PROTECTED]> wrote: > > From: Hari Sreekumar <[EMAIL PROTECTED]> > Subject: Re: How to Influence Reduce Task Location. > To: [EMAIL PROTECTED] > Date: Saturday, December 18, 2010, 10:16 AM > > Hi Jane, > > The partitioner class can be used to achieve this. ( http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/Partitioner.html > ). > > Thanks, > Hari > > On Sat, Dec 18, 2010 at 11:13 PM, Jane Chen > <[EMAIL PROTECTED]> wrote: > Hi All, > > Is there anyway to influence where a reduce task is run? We have a > case where we'd like to choose the host to run the reduce task based > on the task's input key. > > Any suggestion is greatly appreciated. > > Thanks, > Jane > > > > >
|
|