Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Reducer MapFileOutpuFormat


Copy link to this message
-
Re: Reducer MapFileOutpuFormat
Harsh J 2012-07-27, 22:07
Hi Bertrand,

I believe he is talking about MapFile's index files, explained here:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/MapFile.html

On Fri, Jul 27, 2012 at 11:24 AM, Bertrand Dechoux <[EMAIL PROTECTED]> wrote:
> Your use of 'index' is indeed not clear. Are you talking about Hive or
> HBase?
>
> I can confirm that you will have one result file per reducer. Of course,
> for efficiency reasons, you need to limit the number of files. But if you
> are using multiple reducers it should mean that one reducer isn't fast
> enough, so it could be assumed that the output for each reducer is big
> enough. If that not the case, you can limit the number of reducer to one.
>
> In general, the 'fragmentation' of the results is dealt by the next job.
> You should provide more information about your real problem and its context.
>
> Bertrand
>
> On Fri, Jul 27, 2012 at 3:15 AM, syed kather <[EMAIL PROTECTED]> wrote:
>
>> Mike ,
>> Can you please give more details . Context is not clear . Can you share ur
>> use case if possible
>> On Jul 24, 2012 1:40 AM, "Mike S" <[EMAIL PROTECTED]> wrote:
>>
>> > If I set my reducer output to map file output format and the job would
>> > say have 100 reducers, will the output generate 100 different index
>> > file (one for each reducer) or one index file for all the reducers
>> > (basically one index file per job)?
>> >
>> > If it is one index file per reducer, can rely on HDFS append to change
>> > the index write behavior and build one index file from all the
>> > reducers by basically making all the parallel reducers to append to
>> > one index file? Data files do not matter.
>> >
>>
>
>
>
> --
> Bertrand Dechoux

--
Harsh J