|
Jeremy Lewi
2012-10-03, 15:17
Harsh J
2012-10-03, 16:52
Jeremy Lewi
2012-10-05, 15:48
Harsh J
2012-10-05, 17:13
Jeremy Lewi
2012-10-05, 17:40
|
-
Counters that track the max valueJeremy Lewi 2012-10-03, 15:17
HI hadoop-users,
I'm curious if there is an implementation somewhere of a counter which tracks the maximum of some value across all mappers or reducers? Thanks J
-
Re: Counters that track the max valueHarsh J 2012-10-03, 16:52
Jeremy,
Here's my shot at it (pardon the quick crappy code): https://gist.github.com/3828246 Basically - you can achieve it in two ways: Requirement: All tasks must increment the "max" designated counter only AFTER the max has been computed (i.e. in cleanup). 1. All tasks may use same counter name. Later, we pull per-task counters and determine the max at the client. (This is my quick and dirty implementation) 2. All tasks may use their own task ID (Number part) in the counter name, but use the same group. Later, we fetch all counters for that group and iterate over it to find the max. This is cleaner, and doesn't end up using deprecated APIs such as the above. Does this help? On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <[EMAIL PROTECTED]> wrote: > HI hadoop-users, > > I'm curious if there is an implementation somewhere of a counter which > tracks the maximum of some value across all mappers or reducers? > > Thanks > J -- Harsh J
-
Re: Counters that track the max valueJeremy Lewi 2012-10-05, 15:48
HI Harsh,
Thank you very much that will work. How come we can't simply create a modification of a regular mapreduce counter which does this behind the scenes? It seems like we should just be able to replace "+" with "max" and everything else should work? J On Wed, Oct 3, 2012 at 9:52 AM, Harsh J <[EMAIL PROTECTED]> wrote: > Jeremy, > > Here's my shot at it (pardon the quick crappy code): > https://gist.github.com/3828246 > > Basically - you can achieve it in two ways: > > Requirement: All tasks must increment the "max" designated counter > only AFTER the max has been computed (i.e. in cleanup). > > 1. All tasks may use same counter name. Later, we pull per-task > counters and determine the max at the client. (This is my quick and > dirty implementation) > 2. All tasks may use their own task ID (Number part) in the counter > name, but use the same group. Later, we fetch all counters for that > group and iterate over it to find the max. This is cleaner, and > doesn't end up using deprecated APIs such as the above. > > Does this help? > > On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <[EMAIL PROTECTED]> wrote: > > HI hadoop-users, > > > > I'm curious if there is an implementation somewhere of a counter which > > tracks the maximum of some value across all mappers or reducers? > > > > Thanks > > J > > > > -- > Harsh J >
-
Re: Counters that track the max valueHarsh J 2012-10-05, 17:13
Jeremy,
I suppose thats doable, please file a MAPREDUCE JIRA so you can discuss this with others on the development side as well. I am guessing that MAX operations of most of the user-oriented data flow front-ends such as Hive and Pig already do this efficiently, so perhaps there hasn't been a very strong need for this. On Fri, Oct 5, 2012 at 9:18 PM, Jeremy Lewi <[EMAIL PROTECTED]> wrote: > HI Harsh, > > Thank you very much that will work. > > How come we can't simply create a modification of a regular mapreduce > counter which does this behind the scenes? It seems like we should just be > able to replace "+" with "max" and everything else should work? > > J > > > On Wed, Oct 3, 2012 at 9:52 AM, Harsh J <[EMAIL PROTECTED]> wrote: >> >> Jeremy, >> >> Here's my shot at it (pardon the quick crappy code): >> https://gist.github.com/3828246 >> >> Basically - you can achieve it in two ways: >> >> Requirement: All tasks must increment the "max" designated counter >> only AFTER the max has been computed (i.e. in cleanup). >> >> 1. All tasks may use same counter name. Later, we pull per-task >> counters and determine the max at the client. (This is my quick and >> dirty implementation) >> 2. All tasks may use their own task ID (Number part) in the counter >> name, but use the same group. Later, we fetch all counters for that >> group and iterate over it to find the max. This is cleaner, and >> doesn't end up using deprecated APIs such as the above. >> >> Does this help? >> >> On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <[EMAIL PROTECTED]> wrote: >> > HI hadoop-users, >> > >> > I'm curious if there is an implementation somewhere of a counter which >> > tracks the maximum of some value across all mappers or reducers? >> > >> > Thanks >> > J >> >> >> >> -- >> Harsh J > > -- Harsh J
-
Re: Counters that track the max valueJeremy Lewi 2012-10-05, 17:40
Done.
https://issues.apache.org/jira/browse/MAPREDUCE-4709 Thanks J On Fri, Oct 5, 2012 at 10:13 AM, Harsh J <[EMAIL PROTECTED]> wrote: > Jeremy, > > I suppose thats doable, please file a MAPREDUCE JIRA so you can > discuss this with others on the development side as well. > > I am guessing that MAX operations of most of the user-oriented data > flow front-ends such as Hive and Pig already do this efficiently, so > perhaps there hasn't been a very strong need for this. > > On Fri, Oct 5, 2012 at 9:18 PM, Jeremy Lewi <[EMAIL PROTECTED]> wrote: > > HI Harsh, > > > > Thank you very much that will work. > > > > How come we can't simply create a modification of a regular mapreduce > > counter which does this behind the scenes? It seems like we should just > be > > able to replace "+" with "max" and everything else should work? > > > > J > > > > > > On Wed, Oct 3, 2012 at 9:52 AM, Harsh J <[EMAIL PROTECTED]> wrote: > >> > >> Jeremy, > >> > >> Here's my shot at it (pardon the quick crappy code): > >> https://gist.github.com/3828246 > >> > >> Basically - you can achieve it in two ways: > >> > >> Requirement: All tasks must increment the "max" designated counter > >> only AFTER the max has been computed (i.e. in cleanup). > >> > >> 1. All tasks may use same counter name. Later, we pull per-task > >> counters and determine the max at the client. (This is my quick and > >> dirty implementation) > >> 2. All tasks may use their own task ID (Number part) in the counter > >> name, but use the same group. Later, we fetch all counters for that > >> group and iterate over it to find the max. This is cleaner, and > >> doesn't end up using deprecated APIs such as the above. > >> > >> Does this help? > >> > >> On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <[EMAIL PROTECTED]> wrote: > >> > HI hadoop-users, > >> > > >> > I'm curious if there is an implementation somewhere of a counter which > >> > tracks the maximum of some value across all mappers or reducers? > >> > > >> > Thanks > >> > J > >> > >> > >> > >> -- > >> Harsh J > > > > > > > > -- > Harsh J > |