Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Understanding MapReduce source code : Flush operations


+
nagarjuna kanamarlapudi 2014-01-06, 13:09
+
nagarjuna kanamarlapudi 2014-01-07, 03:11
Copy link to this message
-
Re: Understanding MapReduce source code : Flush operations

What OutputFormat are you using?

Once it reaches OutputFormat (specifically RecordWriter) it all depends on what the RecordWriter does. Are you using some OutputFormat with a RecordWriter that buffers like this?

Thanks,
+Vinod

On Jan 6, 2014, at 7:11 PM, nagarjuna kanamarlapudi <[EMAIL PROTECTED]> wrote:

> This is not in DFSClient.
>
> Before the output is written on to HDFS, lot of operations take place.
>
> Like reducer output in mem reaching 90% of HDFS block size, then starting to flush  the data etc..,
>
> So, my requirement is to have a look at that code where in I want to change the logic a bit which suits my convenience.
>
>
> On Tue, Jan 7, 2014 at 12:41 AM, Vinod Kumar Vavilapalli <[EMAIL PROTECTED]> wrote:
> Assuming your output is going to HDFS, you want to look at DFSClient.
>
> Reducer uses FileSystem to write the output. You need to start looking at how DFSClient chunks the output and sends them across to the remote data-nodes.
>
> Thanks
> +Vinod
>
> On Jan 6, 2014, at 11:07 AM, nagarjuna kanamarlapudi <[EMAIL PROTECTED]> wrote:
>
>> I want to have a look at the code where of flush operations that happens after the reduce phase.
>>
>> Reducer writes the output to OutputFormat which inturn pushes that to memory and once it reaches 90% of chunk size it starts to flush the reducer output.
>>
>> I essentially want to look at the code of that flushing operation.
>>
>>
>> What is the class(es) I need to look into
>>
>>
>> On Mon, Jan 6, 2014 at 11:23 PM, Hardik Pandya <[EMAIL PROTECTED]> wrote:
>> Please do not tell me since last 2.5 years you have not used virtual Hadoop environment to debug your Map Reduce application before deploying to Production environment
>>
>> No one can stop you looking at the code , Hadoop and its ecosystem is open-source
>>
>>
>> On Mon, Jan 6, 2014 at 9:35 AM, nagarjuna kanamarlapudi <[EMAIL PROTECTED]> wrote:
>>
>>
>> ---------- Forwarded message ----------
>> From: nagarjuna kanamarlapudi <[EMAIL PROTECTED]>
>> Date: Mon, Jan 6, 2014 at 6:39 PM
>> Subject: Understanding MapReduce source code : Flush operations
>> To: [EMAIL PROTECTED]
>>
>>
>> Hi,
>>
>> I am using hadoop/ map reduce for aout 2.5 years. I want to understand the internals of the hadoop source code.
>>
>> Let me put my requirement very clear.
>>
>> I want to have a look at the code where of flush operations that happens after the reduce phase.
>>
>> Reducer writes the output to OutputFormat which inturn pushes that to memory and once it reaches 90% of chunk size it starts to flush the reducer output.
>>
>> I essentially want to look at the code of that flushing operation.
>>
>>
>>
>>
>> Regards,
>> Nagarjuna K
>>
>>
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
>
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
+
nagarjuna kanamarlapudi 2014-01-07, 06:15
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB