Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> Significance of file.out.index during Shuffle Phase ?


Copy link to this message
-
Re: Significance of file.out.index during Shuffle Phase ?
You'll need to make significant changes MapTask.java which won't make it back to the mainline.

Why? We had this before and quickly ran out of inodes on the local-disk. Think of large jobs with 10,000 maps * 1000 reduces -> that's 10M files.

Arun

On Aug 19, 2012, at 8:57 AM, Pavan Kulkarni wrote:

> Ohh ,Thanks a lot Harsh. Exactly what I was looking for.
> I wanted to create different file.out's for different reducers. Something
> like
> file.out.1 for reducer 1, file.out.2 for reducer etc. Is it possible to do
> this in the MapReduce program or I need to tweak some Hadoop source files
> for that? Thanks.
>
> On Sun, Aug 19, 2012 at 7:02 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> Hey Pavan,
>>
>> Yes you've got it almost right on how file.out is served to each
>> reducer. See the code at
>>
>> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java?view=markup
>> (Method under L502:L565 that sends data for a specific
>> reduce/partition ID (integer)).
>>
>> On Sun, Aug 19, 2012 at 9:05 AM, Pavan Kulkarni <[EMAIL PROTECTED]>
>> wrote:
>>> Hi,
>>>
>>>  I was trying to understand how exactly the reducers find out how to
>> fetch
>>> the data of its own partition from Map nodes.
>>> During the executions of MapReduce, I see that *file.out* is created on
>> Map
>>> nodes, so my question is how does a reducer
>>> know what part of file.out to fetch? Is the *file.out.index* play any
>> role?
>>> Any help is appreciated .Thanks
>>>
>>>
>>>
>>> --With Regards
>>> Pavan Kulkarni
>>
>>
>>
>> --
>> Harsh J
>>
>
>
>
> --
>
> --With Regards
> Pavan Kulkarni

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB