Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> MultipleOutputs - Create multiple files during output


+
modemide 2011-09-01, 19:47
Copy link to this message
-
Re: MultipleOutputs - Create multiple files during output
Hi Tim,

You could create a custom HashPartitioner so that all key,value pairs
denoting the actions of the same user end up in the same reducer; then you
need
only one output file per reducer.  Btw, how large are the output files? make
sure you don't end up creating
a lot of small files, i.e., << 64MB.

Best,

stan

On Thu, Sep 1, 2011 at 3:47 PM, modemide <[EMAIL PROTECTED]> wrote:

> Hi all,
> I was wondering if anyone was familiar with this class.  I want to
> create multiple output files during my reduce.
>
> My input files will consist of
> <name1><action1><date1>
> <name1><action2><date2>
> <name1><action3><date3>
>
> <name2><action1><date1>
> <name2><action2><date2>
> <name2><action3><date3>
>
>
> My goal is to create files with the following format
> Filename:
> <name>_<Date:CCYYMM>
>
> File Contents:
> <action1>
> <action2>
> <action3>
>
>
> I.e. This will store all the actions of one person for any given month
> in one file.
>
> I just don't know how I will decide the file name at run time.  Can anyone
> help?
>
> Thanks,
> Tim
>