Mike S 2012-07-23, 20:09
Harsh J 2012-07-27, 22:06
syed kather 2012-07-27, 01:15
Bertrand Dechoux 2012-07-27, 05:54
-Re: Reducer MapFileOutpuFormat
Harsh J 2012-07-27, 22:07
I believe he is talking about MapFile's index files, explained here:
On Fri, Jul 27, 2012 at 11:24 AM, Bertrand Dechoux <[EMAIL PROTECTED]> wrote:
> Your use of 'index' is indeed not clear. Are you talking about Hive or
> I can confirm that you will have one result file per reducer. Of course,
> for efficiency reasons, you need to limit the number of files. But if you
> are using multiple reducers it should mean that one reducer isn't fast
> enough, so it could be assumed that the output for each reducer is big
> enough. If that not the case, you can limit the number of reducer to one.
> In general, the 'fragmentation' of the results is dealt by the next job.
> You should provide more information about your real problem and its context.
> On Fri, Jul 27, 2012 at 3:15 AM, syed kather <[EMAIL PROTECTED]> wrote:
>> Mike ,
>> Can you please give more details . Context is not clear . Can you share ur
>> use case if possible
>> On Jul 24, 2012 1:40 AM, "Mike S" <[EMAIL PROTECTED]> wrote:
>> > If I set my reducer output to map file output format and the job would
>> > say have 100 reducers, will the output generate 100 different index
>> > file (one for each reducer) or one index file for all the reducers
>> > (basically one index file per job)?
>> > If it is one index file per reducer, can rely on HDFS append to change
>> > the index write behavior and build one index file from all the
>> > reducers by basically making all the parallel reducers to append to
>> > one index file? Data files do not matter.
> Bertrand Dechoux