Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Understanding MapReduce source code : Flush operations


Copy link to this message
-
Re: Understanding MapReduce source code : Flush operations
nagarjuna kanamarlapudi 2014-01-07, 03:11
This is not in DFSClient.

Before the output is written on to HDFS, lot of operations take place.

Like reducer output in mem reaching 90% of HDFS block size, then starting
to flush  the data etc..,

So, my requirement is to have a look at that code where in I want to change
the logic a bit which suits my convenience.
On Tue, Jan 7, 2014 at 12:41 AM, Vinod Kumar Vavilapalli <
[EMAIL PROTECTED]> wrote:

> Assuming your output is going to HDFS, you want to look at DFSClient.
>
> Reducer uses FileSystem to write the output. You need to start looking at
> how DFSClient chunks the output and sends them across to the remote
> data-nodes.
>
> Thanks
> +Vinod
>
> On Jan 6, 2014, at 11:07 AM, nagarjuna kanamarlapudi <
> [EMAIL PROTECTED]> wrote:
>
> I want to have a look at the code where of flush operations that happens
> after the reduce phase.
>
> Reducer writes the output to OutputFormat which inturn pushes that to
> memory and once it reaches 90% of chunk size it starts to flush the reducer
> output.
>
> I essentially want to look at the code of that flushing operation.
>
>
> What is the class(es) I need to look into
>
>
> On Mon, Jan 6, 2014 at 11:23 PM, Hardik Pandya <[EMAIL PROTECTED]>wrote:
>
>> Please do not tell me since last 2.5 years you have not used virtual
>> Hadoop environment to debug your Map Reduce application before deploying to
>> Production environment
>>
>> No one can stop you looking at the code , Hadoop and its ecosystem is
>> open-source
>>
>>
>> On Mon, Jan 6, 2014 at 9:35 AM, nagarjuna kanamarlapudi <
>> [EMAIL PROTECTED]> wrote:
>>
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: nagarjuna kanamarlapudi <[EMAIL PROTECTED]>
>>>  Date: Mon, Jan 6, 2014 at 6:39 PM
>>> Subject: Understanding MapReduce source code : Flush operations
>>> To: [EMAIL PROTECTED]
>>>
>>>
>>>  Hi,
>>>
>>> I am using hadoop/ map reduce for aout 2.5 years. I want to understand
>>> the internals of the hadoop source code.
>>>
>>> Let me put my requirement very clear.
>>>
>>> I want to have a look at the code where of flush operations that happens
>>> after the reduce phase.
>>>
>>> Reducer writes the output to OutputFormat which inturn pushes that to
>>> memory and once it reaches 90% of chunk size it starts to flush the reducer
>>> output.
>>>
>>> I essentially want to look at the code of that flushing operation.
>>>
>>>
>>>
>>>
>>> Regards,
>>> Nagarjuna K
>>>
>>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.