Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: Real Multiple Outputs for Hadoop -- is this implementation correct?


Copy link to this message
-
Re: Real Multiple Outputs for Hadoop -- is this implementation correct?
I took a very brief look, and the approach to use multiple OCs, one
per unique parent path from a task, seems the right thing to do. Nice
work! Do consider contributing this if its working well for you :)

On Sat, Sep 14, 2013 at 12:53 AM, Paul Houle <[EMAIL PROTECTED]> wrote:
> Hey guys I spent some time last week thinking about Hadoop before I wrote my
> own class,  RealMultipleOutputs,  that does something like what
> MultipleOutputs does,  except that you can specify different hdfs paths for
> the different output streams.   My pals were telling me to use Cascading or
> Pig if I want this functionality,  but otherwise I was happy writing Plain
> M/R jars
>
> I wrote up the implementation here:
>
> https://github.com/paulhoule/infovore/wiki/Real-Multiple-Outputs-in-Hadoop
>
> And this works hand-in hand with an abstraction layer that supports unit
> testing w/ Mockito
>
> https://github.com/paulhoule/infovore/wiki/Unit-Testing-Hadoop-Mappers-and-Reducers
>
> Anyway,  I'd appreciate anybody looking at this code and trying to poke
> holes in it.  It runs OK on my tiny dev cluster in 1.0.4,  1.1.2 and in AMZN
> EMR but I am wondering if I missed something.
>
>

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB