-Re: Reducer MapFileOutpuFormat
Bertrand Dechoux 2012-07-27, 05:54
Your use of 'index' is indeed not clear. Are you talking about Hive or
I can confirm that you will have one result file per reducer. Of course,
for efficiency reasons, you need to limit the number of files. But if you
are using multiple reducers it should mean that one reducer isn't fast
enough, so it could be assumed that the output for each reducer is big
enough. If that not the case, you can limit the number of reducer to one.
In general, the 'fragmentation' of the results is dealt by the next job.
You should provide more information about your real problem and its context.
On Fri, Jul 27, 2012 at 3:15 AM, syed kather <[EMAIL PROTECTED]> wrote:
> Mike ,
> Can you please give more details . Context is not clear . Can you share ur
> use case if possible
> On Jul 24, 2012 1:40 AM, "Mike S" <[EMAIL PROTECTED]> wrote:
> > If I set my reducer output to map file output format and the job would
> > say have 100 reducers, will the output generate 100 different index
> > file (one for each reducer) or one index file for all the reducers
> > (basically one index file per job)?
> > If it is one index file per reducer, can rely on HDFS append to change
> > the index write behavior and build one index file from all the
> > reducers by basically making all the parallel reducers to append to
> > one index file? Data files do not matter.