Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> MultipleOutputs - Create multiple files during output


Copy link to this message
-
Re: MultipleOutputs - Create multiple files during output
Hi Tim,

You could create a custom HashPartitioner so that all key,value pairs
denoting the actions of the same user end up in the same reducer; then you
need
only one output file per reducer.  Btw, how large are the output files? make
sure you don't end up creating
a lot of small files, i.e., << 64MB.

Best,

stan

On Thu, Sep 1, 2011 at 3:47 PM, modemide <[EMAIL PROTECTED]> wrote:

> Hi all,
> I was wondering if anyone was familiar with this class.  I want to
> create multiple output files during my reduce.
>
> My input files will consist of
> <name1><action1><date1>
> <name1><action2><date2>
> <name1><action3><date3>
>
> <name2><action1><date1>
> <name2><action2><date2>
> <name2><action3><date3>
>
>
> My goal is to create files with the following format
> Filename:
> <name>_<Date:CCYYMM>
>
> File Contents:
> <action1>
> <action2>
> <action3>
>
>
> I.e. This will store all the actions of one person for any given month
> in one file.
>
> I just don't know how I will decide the file name at run time.  Can anyone
> help?
>
> Thanks,
> Tim
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB