Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Understanding MapReduce source code : Flush operations


Copy link to this message
-
Re: Understanding MapReduce source code : Flush operations
I am using TextOutputFormat

Ok, the idea over here is , This output format writes to to a record
writer.. which in turn has to pass it on *some other object *where the data
is stored in mem and flushed once the block size is reached.

I want to look at that *some other object  *and other helper classes which
is writing/flushing the output to disk.

Regards,
Nagarjuna

On Tue, Jan 7, 2014 at 10:51 AM, Vinod Kumar Vavilapalli <
[EMAIL PROTECTED]> wrote:

>
> What OutputFormat are you using?
>
> Once it reaches OutputFormat (specifically RecordWriter) it all depends on
> what the RecordWriter does. Are you using some OutputFormat with a
> RecordWriter that buffers like this?
>
> Thanks,
> +Vinod
>
> On Jan 6, 2014, at 7:11 PM, nagarjuna kanamarlapudi <
> [EMAIL PROTECTED]> wrote:
>
> This is not in DFSClient.
>
> Before the output is written on to HDFS, lot of operations take place.
>
> Like reducer output in mem reaching 90% of HDFS block size, then starting
> to flush  the data etc..,
>
> So, my requirement is to have a look at that code where in I want to
> change the logic a bit which suits my convenience.
>
>
> On Tue, Jan 7, 2014 at 12:41 AM, Vinod Kumar Vavilapalli <
> [EMAIL PROTECTED]> wrote:
>
>> Assuming your output is going to HDFS, you want to look at DFSClient.
>>
>>  Reducer uses FileSystem to write the output. You need to start looking
>> at how DFSClient chunks the output and sends them across to the remote
>> data-nodes.
>>
>> Thanks
>> +Vinod
>>
>> On Jan 6, 2014, at 11:07 AM, nagarjuna kanamarlapudi <
>> [EMAIL PROTECTED]> wrote:
>>
>> I want to have a look at the code where of flush operations that happens
>> after the reduce phase.
>>
>> Reducer writes the output to OutputFormat which inturn pushes that to
>> memory and once it reaches 90% of chunk size it starts to flush the reducer
>> output.
>>
>> I essentially want to look at the code of that flushing operation.
>>
>>
>> What is the class(es) I need to look into
>>
>>
>> On Mon, Jan 6, 2014 at 11:23 PM, Hardik Pandya <[EMAIL PROTECTED]>wrote:
>>
>>> Please do not tell me since last 2.5 years you have not used virtual
>>> Hadoop environment to debug your Map Reduce application before deploying to
>>> Production environment
>>>
>>> No one can stop you looking at the code , Hadoop and its ecosystem is
>>> open-source
>>>
>>>
>>> On Mon, Jan 6, 2014 at 9:35 AM, nagarjuna kanamarlapudi <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: nagarjuna kanamarlapudi <[EMAIL PROTECTED]>
>>>>  Date: Mon, Jan 6, 2014 at 6:39 PM
>>>> Subject: Understanding MapReduce source code : Flush operations
>>>> To: [EMAIL PROTECTED]
>>>>
>>>>
>>>>  Hi,
>>>>
>>>> I am using hadoop/ map reduce for aout 2.5 years. I want to understand
>>>> the internals of the hadoop source code.
>>>>
>>>> Let me put my requirement very clear.
>>>>
>>>> I want to have a look at the code where of flush operations that
>>>> happens after the reduce phase.
>>>>
>>>> Reducer writes the output to OutputFormat which inturn pushes that to
>>>> memory and once it reaches 90% of chunk size it starts to flush the reducer
>>>> output.
>>>>
>>>> I essentially want to look at the code of that flushing operation.
>>>>
>>>>
>>>>
>>>>
>>>> Regards,
>>>> Nagarjuna K
>>>>
>>>>
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.