Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Reducer MapFileOutpuFormat

Copy link to this message
Re: Reducer MapFileOutpuFormat
Hi Bertrand,

I believe he is talking about MapFile's index files, explained here:

On Fri, Jul 27, 2012 at 11:24 AM, Bertrand Dechoux <[EMAIL PROTECTED]> wrote:
> Your use of 'index' is indeed not clear. Are you talking about Hive or
> HBase?
> I can confirm that you will have one result file per reducer. Of course,
> for efficiency reasons, you need to limit the number of files. But if you
> are using multiple reducers it should mean that one reducer isn't fast
> enough, so it could be assumed that the output for each reducer is big
> enough. If that not the case, you can limit the number of reducer to one.
> In general, the 'fragmentation' of the results is dealt by the next job.
> You should provide more information about your real problem and its context.
> Bertrand
> On Fri, Jul 27, 2012 at 3:15 AM, syed kather <[EMAIL PROTECTED]> wrote:
>> Mike ,
>> Can you please give more details . Context is not clear . Can you share ur
>> use case if possible
>> On Jul 24, 2012 1:40 AM, "Mike S" <[EMAIL PROTECTED]> wrote:
>> > If I set my reducer output to map file output format and the job would
>> > say have 100 reducers, will the output generate 100 different index
>> > file (one for each reducer) or one index file for all the reducers
>> > (basically one index file per job)?
>> >
>> > If it is one index file per reducer, can rely on HDFS append to change
>> > the index write behavior and build one index file from all the
>> > reducers by basically making all the parallel reducers to append to
>> > one index file? Data files do not matter.
>> >
> --
> Bertrand Dechoux

Harsh J