Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: Understanding MapReduce source code : Flush operations


+
nagarjuna kanamarlapudi 2014-01-06, 19:07
Copy link to this message
-
Re: Understanding MapReduce source code : Flush operations
Assuming your output is going to HDFS, you want to look at DFSClient.

Reducer uses FileSystem to write the output. You need to start looking at how DFSClient chunks the output and sends them across to the remote data-nodes.

Thanks
+Vinod

On Jan 6, 2014, at 11:07 AM, nagarjuna kanamarlapudi <[EMAIL PROTECTED]> wrote:

> I want to have a look at the code where of flush operations that happens after the reduce phase.
>
> Reducer writes the output to OutputFormat which inturn pushes that to memory and once it reaches 90% of chunk size it starts to flush the reducer output.
>
> I essentially want to look at the code of that flushing operation.
>
>
> What is the class(es) I need to look into
>
>
> On Mon, Jan 6, 2014 at 11:23 PM, Hardik Pandya <[EMAIL PROTECTED]> wrote:
> Please do not tell me since last 2.5 years you have not used virtual Hadoop environment to debug your Map Reduce application before deploying to Production environment
>
> No one can stop you looking at the code , Hadoop and its ecosystem is open-source
>
>
> On Mon, Jan 6, 2014 at 9:35 AM, nagarjuna kanamarlapudi <[EMAIL PROTECTED]> wrote:
>
>
> ---------- Forwarded message ----------
> From: nagarjuna kanamarlapudi <[EMAIL PROTECTED]>
> Date: Mon, Jan 6, 2014 at 6:39 PM
> Subject: Understanding MapReduce source code : Flush operations
> To: [EMAIL PROTECTED]
>
>
> Hi,
>
> I am using hadoop/ map reduce for aout 2.5 years. I want to understand the internals of the hadoop source code.
>
> Let me put my requirement very clear.
>
> I want to have a look at the code where of flush operations that happens after the reduce phase.
>
> Reducer writes the output to OutputFormat which inturn pushes that to memory and once it reaches 90% of chunk size it starts to flush the reducer output.
>
> I essentially want to look at the code of that flushing operation.
>
>
>
>
> Regards,
> Nagarjuna K
>
>
>
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB