Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - is it possible to concatenate output files under many reducers?


Copy link to this message
-
Re: is it possible to concatenate output files under many reducers?
Joey Echeverria 2011-05-13, 01:57
You can control the number of reducers by calling
job.setNumReduceTasks() before you launch it.

-Joey

On Thu, May 12, 2011 at 6:33 PM, Jun Young Kim <[EMAIL PROTECTED]> wrote:
> yes. that is a general solution to control counts of output files.
>
> however, if you need to control counts of outputs dynamically, how could you
> do?
>
> if an output file name is 'A', counts of this output files are needed to be
> 5.
> if an output file name is 'B', counts of this output files are needed to be
> 10.
>
> is it able to be under hadoop?
>
> Junyoung Kim ([EMAIL PROTECTED])
>
>
> On 05/12/2011 02:17 PM, Harsh J wrote:
>>
>> Short, blind answer: You could run 10 reducers.
>>
>> Otherwise, you'll have to run another job that picks up a few files
>> each in mapper and merges them out. But having 60 files shouldn't
>> really be a problem if they are sufficiently large (at least 80% of a
>> block size perhaps -- you can tune # of reducers to achieve this).
>>
>> On Thu, May 12, 2011 at 6:14 AM, Jun Young Kim<[EMAIL PROTECTED]>
>>  wrote:
>>>
>>> hi, all.
>>>
>>> I have 60 reducers which are generating same output files.
>>>
>>> from output-r--00001 to output-r-00059.
>>>
>>> under this situation, I want to control the count of output files.
>>>
>>> for example, is it possible to concatenate all output files to 10 ?
>>>
>>> from output-r-00001 to output-r-00010.
>>>
>>> thanks
>>>
>>> --
>>> Junyoung Kim ([EMAIL PROTECTED])
>>>
>>>
>>
>>
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434