Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Re: Real Multiple Outputs for Hadoop -- is this implementation correct?


Copy link to this message
-
Re: Real Multiple Outputs for Hadoop -- is this implementation correct?
Harsh J 2013-09-13, 19:32
I took a very brief look, and the approach to use multiple OCs, one
per unique parent path from a task, seems the right thing to do. Nice
work! Do consider contributing this if its working well for you :)

On Sat, Sep 14, 2013 at 12:53 AM, Paul Houle <[EMAIL PROTECTED]> wrote:
> Hey guys I spent some time last week thinking about Hadoop before I wrote my
> own class,  RealMultipleOutputs,  that does something like what
> MultipleOutputs does,  except that you can specify different hdfs paths for
> the different output streams.   My pals were telling me to use Cascading or
> Pig if I want this functionality,  but otherwise I was happy writing Plain
> M/R jars
>
> I wrote up the implementation here:
>
> https://github.com/paulhoule/infovore/wiki/Real-Multiple-Outputs-in-Hadoop
>
> And this works hand-in hand with an abstraction layer that supports unit
> testing w/ Mockito
>
> https://github.com/paulhoule/infovore/wiki/Unit-Testing-Hadoop-Mappers-and-Reducers
>
> Anyway,  I'd appreciate anybody looking at this code and trying to poke
> holes in it.  It runs OK on my tiny dev cluster in 1.0.4,  1.1.2 and in AMZN
> EMR but I am wondering if I missed something.
>
>

--
Harsh J