|
James Seigel
2010-07-21, 03:36
Jeff Zhang
2010-07-21, 05:34
James Seigel
2010-07-21, 16:51
James Seigel
2010-07-21, 20:55
James Seigel
2010-07-21, 20:59
Jeff Zhang
2010-07-22, 01:22
Chris K Wensel
2010-07-22, 04:32
James Seigel
2010-09-01, 19:27
Lance Norskog
2010-09-02, 02:10
|
-
From X to Hadoop MapReduceJames Seigel 2010-07-21, 03:36
Hello!
Here is what I have been thinking over the last while. There are probably a number of us that have prototyped stuff in pig or hive and now think that we should convert it to some java map reduce code. Would anyone be interested in building a “patterns” area somewhere possibly with some code to give solid usable examples build some map reduce code without having to re-invent the wheel each time? Example: What would be the “best” or “a” way to make the DISTINCT command from pig in a java map reduce program? Let me know if anyone is interested in this. I’d like to get some sharing going. Cheers James.
-
Re: From X to Hadoop MapReduceJeff Zhang 2010-07-21, 05:34
Hi James,
Maybe Cascading is what you are looking for. http://www.cascading.org/ On Wed, Jul 21, 2010 at 11:36 AM, James Seigel <[EMAIL PROTECTED]> wrote: > Hello! > > Here is what I have been thinking over the last while. There are probably > a number of us that have prototyped stuff in pig or hive and now think that > we should convert it to some java map reduce code. > > Would anyone be interested in building a “patterns” area somewhere possibly > with some code to give solid usable examples build some map reduce code > without having to re-invent the wheel each time? > > Example: > > What would be the “best” or “a” way to make the DISTINCT command from pig > in a java map reduce program? > > Let me know if anyone is interested in this. I’d like to get some sharing > going. > > Cheers > James. > > > -- Best Regards Jeff Zhang
-
Re: From X to Hadoop MapReduceJames Seigel 2010-07-21, 16:51
Jeff, I agree that cascading looks cool and might/should have a place in everyone’s tool box, however at some corps it takes a while to get those kinds of changes in place and therefore they might have to hand craft some java code before moving (if they ever can) to a different technology.
I will get something up and going and post a link back for whomever is interested. To answer Himanshu’s question, I am thinking something like this (with some code): Hadoop M/R Patterns, and ones that match Pig Structures 1. COUNT: [Mapper] Spit out one key and the value of 1. [Combiner] Same as reducer. [Reducer] count = count + next.value. [Emit] Single result. 2. FREQ COUNT: [Mapper] Item, 1. [Combiner] Same as reducer. [Reducer] count = count + next.value. [Emit] list of Key, count 3. UNIQUE: [Mapper] Item, One. [Combiner] None. [Reducer + Emit] spit out list of keys and no value. I think adding a description of why the technique works would be helpful for people learning as well. I see some questions from people not understanding what happens to the data between mappers and reducers, or what data they will see when it gets to the reducer...etc... Cheers James.
-
Re: From X to Hadoop MapReduceJames Seigel 2010-07-21, 20:55
Here is a skeleton project I stuffed up on github (feel free to offer other suggestions/alternatives). There is a wiki, a place to commit code, a place to fork around, etc..
Over the next couple of days I’ll try and put up some sample samples for people to poke around with. Feel free to attack the wiki, contribute code, etc... If anyone can derive some cool pseudo code to write map reduce type algorithms that’d be great. Cheers James. On 2010-07-21, at 10:51 AM, James Seigel wrote: > Jeff, I agree that cascading looks cool and might/should have a place in everyone’s tool box, however at some corps it takes a while to get those kinds of changes in place and therefore they might have to hand craft some java code before moving (if they ever can) to a different technology. > > I will get something up and going and post a link back for whomever is interested. > > To answer Himanshu’s question, I am thinking something like this (with some code): > > Hadoop M/R Patterns, and ones that match Pig Structures > > 1. COUNT: [Mapper] Spit out one key and the value of 1. [Combiner] Same as reducer. [Reducer] count = count + next.value. [Emit] Single result. > 2. FREQ COUNT: [Mapper] Item, 1. [Combiner] Same as reducer. [Reducer] count = count + next.value. [Emit] list of Key, count > 3. UNIQUE: [Mapper] Item, One. [Combiner] None. [Reducer + Emit] spit out list of keys and no value. > > I think adding a description of why the technique works would be helpful for people learning as well. I see some questions from people not understanding what happens to the data between mappers and reducers, or what data they will see when it gets to the reducer...etc... > > Cheers > James. >
-
Re: From X to Hadoop MapReduceJames Seigel 2010-07-21, 20:59
Oh yeah, it would help if I put the url:
http://github.com/seigel/MRPatterns James On 2010-07-21, at 2:55 PM, James Seigel wrote: > Here is a skeleton project I stuffed up on github (feel free to offer other suggestions/alternatives). There is a wiki, a place to commit code, a place to fork around, etc.. > > Over the next couple of days I’ll try and put up some sample samples for people to poke around with. Feel free to attack the wiki, contribute code, etc... > > If anyone can derive some cool pseudo code to write map reduce type algorithms that’d be great. > > Cheers > James. > > > On 2010-07-21, at 10:51 AM, James Seigel wrote: > >> Jeff, I agree that cascading looks cool and might/should have a place in everyone’s tool box, however at some corps it takes a while to get those kinds of changes in place and therefore they might have to hand craft some java code before moving (if they ever can) to a different technology. >> >> I will get something up and going and post a link back for whomever is interested. >> >> To answer Himanshu’s question, I am thinking something like this (with some code): >> >> Hadoop M/R Patterns, and ones that match Pig Structures >> >> 1. COUNT: [Mapper] Spit out one key and the value of 1. [Combiner] Same as reducer. [Reducer] count = count + next.value. [Emit] Single result. >> 2. FREQ COUNT: [Mapper] Item, 1. [Combiner] Same as reducer. [Reducer] count = count + next.value. [Emit] list of Key, count >> 3. UNIQUE: [Mapper] Item, One. [Combiner] None. [Reducer + Emit] spit out list of keys and no value. >> >> I think adding a description of why the technique works would be helpful for people learning as well. I see some questions from people not understanding what happens to the data between mappers and reducers, or what data they will see when it gets to the reducer...etc... >> >> Cheers >> James. >> >
-
Re: From X to Hadoop MapReduceJeff Zhang 2010-07-22, 01:22
Cool, James. I am very interested to contribute to this.
I think group by, join and order by can been added to the examples. On Thu, Jul 22, 2010 at 4:59 AM, James Seigel <[EMAIL PROTECTED]> wrote: > Oh yeah, it would help if I put the url: > > http://github.com/seigel/MRPatterns > > James > > On 2010-07-21, at 2:55 PM, James Seigel wrote: > > > Here is a skeleton project I stuffed up on github (feel free to offer > other suggestions/alternatives). There is a wiki, a place to commit code, a > place to fork around, etc.. > > > > Over the next couple of days I’ll try and put up some sample samples for > people to poke around with. Feel free to attack the wiki, contribute code, > etc... > > > > If anyone can derive some cool pseudo code to write map reduce type > algorithms that’d be great. > > > > Cheers > > James. > > > > > > On 2010-07-21, at 10:51 AM, James Seigel wrote: > > > >> Jeff, I agree that cascading looks cool and might/should have a place in > everyone’s tool box, however at some corps it takes a while to get those > kinds of changes in place and therefore they might have to hand craft some > java code before moving (if they ever can) to a different technology. > >> > >> I will get something up and going and post a link back for whomever is > interested. > >> > >> To answer Himanshu’s question, I am thinking something like this (with > some code): > >> > >> Hadoop M/R Patterns, and ones that match Pig Structures > >> > >> 1. COUNT: [Mapper] Spit out one key and the value of 1. [Combiner] Same > as reducer. [Reducer] count = count + next.value. [Emit] Single result. > >> 2. FREQ COUNT: [Mapper] Item, 1. [Combiner] Same as reducer. [Reducer] > count = count + next.value. [Emit] list of Key, count > >> 3. UNIQUE: [Mapper] Item, One. [Combiner] None. [Reducer + Emit] spit > out list of keys and no value. > >> > >> I think adding a description of why the technique works would be helpful > for people learning as well. I see some questions from people not > understanding what happens to the data between mappers and reducers, or what > data they will see when it gets to the reducer...etc... > >> > >> Cheers > >> James. > >> > > > > -- Best Regards Jeff Zhang
-
Re: From X to Hadoop MapReduceChris K Wensel 2010-07-22, 04:32
> Jeff, I agree that cascading looks cool and might/should have a place in everyone’s tool box, however at some corps it takes a while to get those kinds of changes in place and therefore they might have to hand craft some java code before moving (if they ever can) to a different technology.
Huh? -- Chris K Wensel [EMAIL PROTECTED] http://www.concurrentinc.com
-
Re: From X to Hadoop MapReduceJames Seigel 2010-09-01, 19:27
Sounds good! Please give some examples :)
I just got back from some holidays and will start posting some more stuff shortly Cheers James. On 2010-07-21, at 7:22 PM, Jeff Zhang wrote: > Cool, James. I am very interested to contribute to this. > I think group by, join and order by can been added to the examples. > > > On Thu, Jul 22, 2010 at 4:59 AM, James Seigel <[EMAIL PROTECTED]> wrote: > >> Oh yeah, it would help if I put the url: >> >> http://github.com/seigel/MRPatterns >> >> James >> >> On 2010-07-21, at 2:55 PM, James Seigel wrote: >> >>> Here is a skeleton project I stuffed up on github (feel free to offer >> other suggestions/alternatives). There is a wiki, a place to commit code, a >> place to fork around, etc.. >>> >>> Over the next couple of days I’ll try and put up some sample samples for >> people to poke around with. Feel free to attack the wiki, contribute code, >> etc... >>> >>> If anyone can derive some cool pseudo code to write map reduce type >> algorithms that’d be great. >>> >>> Cheers >>> James. >>> >>> >>> On 2010-07-21, at 10:51 AM, James Seigel wrote: >>> >>>> Jeff, I agree that cascading looks cool and might/should have a place in >> everyone’s tool box, however at some corps it takes a while to get those >> kinds of changes in place and therefore they might have to hand craft some >> java code before moving (if they ever can) to a different technology. >>>> >>>> I will get something up and going and post a link back for whomever is >> interested. >>>> >>>> To answer Himanshu’s question, I am thinking something like this (with >> some code): >>>> >>>> Hadoop M/R Patterns, and ones that match Pig Structures >>>> >>>> 1. COUNT: [Mapper] Spit out one key and the value of 1. [Combiner] Same >> as reducer. [Reducer] count = count + next.value. [Emit] Single result. >>>> 2. FREQ COUNT: [Mapper] Item, 1. [Combiner] Same as reducer. [Reducer] >> count = count + next.value. [Emit] list of Key, count >>>> 3. UNIQUE: [Mapper] Item, One. [Combiner] None. [Reducer + Emit] spit >> out list of keys and no value. >>>> >>>> I think adding a description of why the technique works would be helpful >> for people learning as well. I see some questions from people not >> understanding what happens to the data between mappers and reducers, or what >> data they will see when it gets to the reducer...etc... >>>> >>>> Cheers >>>> James. >>>> >>> >> >> > > > -- > Best Regards > > Jeff Zhang
-
Re: From X to Hadoop MapReduceLance Norskog 2010-09-02, 02:10
'hamake' on github looks like a handy tool as well- haven't used it.
It does the old unix 'make' timestamp dependency trick on the input&output file sets, to decide which jobs to run in sequence. And possibly in parallel. Lance On Wed, Sep 1, 2010 at 12:27 PM, James Seigel <[EMAIL PROTECTED]> wrote: > Sounds good! Please give some examples :) > > I just got back from some holidays and will start posting some more stuff shortly > > Cheers > James. > > > On 2010-07-21, at 7:22 PM, Jeff Zhang wrote: > >> Cool, James. I am very interested to contribute to this. >> I think group by, join and order by can been added to the examples. >> >> >> On Thu, Jul 22, 2010 at 4:59 AM, James Seigel <[EMAIL PROTECTED]> wrote: >> >>> Oh yeah, it would help if I put the url: >>> >>> http://github.com/seigel/MRPatterns >>> >>> James >>> >>> On 2010-07-21, at 2:55 PM, James Seigel wrote: >>> >>>> Here is a skeleton project I stuffed up on github (feel free to offer >>> other suggestions/alternatives). There is a wiki, a place to commit code, a >>> place to fork around, etc.. >>>> >>>> Over the next couple of days I’ll try and put up some sample samples for >>> people to poke around with. Feel free to attack the wiki, contribute code, >>> etc... >>>> >>>> If anyone can derive some cool pseudo code to write map reduce type >>> algorithms that’d be great. >>>> >>>> Cheers >>>> James. >>>> >>>> >>>> On 2010-07-21, at 10:51 AM, James Seigel wrote: >>>> >>>>> Jeff, I agree that cascading looks cool and might/should have a place in >>> everyone’s tool box, however at some corps it takes a while to get those >>> kinds of changes in place and therefore they might have to hand craft some >>> java code before moving (if they ever can) to a different technology. >>>>> >>>>> I will get something up and going and post a link back for whomever is >>> interested. >>>>> >>>>> To answer Himanshu’s question, I am thinking something like this (with >>> some code): >>>>> >>>>> Hadoop M/R Patterns, and ones that match Pig Structures >>>>> >>>>> 1. COUNT: [Mapper] Spit out one key and the value of 1. [Combiner] Same >>> as reducer. [Reducer] count = count + next.value. [Emit] Single result. >>>>> 2. FREQ COUNT: [Mapper] Item, 1. [Combiner] Same as reducer. [Reducer] >>> count = count + next.value. [Emit] list of Key, count >>>>> 3. UNIQUE: [Mapper] Item, One. [Combiner] None. [Reducer + Emit] spit >>> out list of keys and no value. >>>>> >>>>> I think adding a description of why the technique works would be helpful >>> for people learning as well. I see some questions from people not >>> understanding what happens to the data between mappers and reducers, or what >>> data they will see when it gets to the reducer...etc... >>>>> >>>>> Cheers >>>>> James. >>>>> >>>> >>> >>> >> >> >> -- >> Best Regards >> >> Jeff Zhang > > -- Lance Norskog [EMAIL PROTECTED] |