Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Shuffle phase replication factor


+
John Lilley 2013-05-21, 18:57
+
Kai Voigt 2013-05-21, 18:58
+
John Lilley 2013-05-22, 14:33
+
Shahab Yunus 2013-05-22, 14:37
Copy link to this message
-
Re: Shuffle phase replication factor
There are properties/configuration to control the no. of copying threads
for copy.
tasktracker.http.threads=40
Thanks,
Rahul
On Wed, May 22, 2013 at 8:16 PM, John Lilley <[EMAIL PROTECTED]>wrote:

>  This brings up another nagging question I’ve had for some time.  Between
> HDFS and shuffle, there seems to be the potential for “every node
> connecting to every other node” via TCP.  Are there explicit mechanisms in
> place to manage or limit simultaneous connections?  Is the protocol simply
> robust enough to allow a server-side to disconnect at any time to free up
> slots and the client-side will retry the request?****
>
> Thanks****
>
> john****
>
> ** **
>
> *From:* Shahab Yunus [mailto:[EMAIL PROTECTED]]
> *Sent:* Wednesday, May 22, 2013 8:38 AM
>
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Shuffle phase replication factor****
>
> ** **
>
> As mentioned by Bertrand, Hadoop, The Definitive Guide, is well... really
> definitive :) place to start. It is pretty thorough for starts and once you
> are gone through it, the code will start making more sense too.****
>
> ** **
>
> Regards,****
>
> Shahab****
>
> ** **
>
> On Wed, May 22, 2013 at 10:33 AM, John Lilley <[EMAIL PROTECTED]>
> wrote:****
>
> Oh I see.  Does this mean there is another service and TCP listen port for
> this purpose?****
>
> Thanks for your indulgence… I would really like to read more about this
> without bothering the group but not sure where to start to learn these
> internals other than the code.****
>
> john****
>
>  ****
>
> *From:* Kai Voigt [mailto:[EMAIL PROTECTED]]
> *Sent:* Tuesday, May 21, 2013 12:59 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Shuffle phase replication factor****
>
>  ****
>
> The map output doesn't get written to HDFS. The map task writes its output
> to its local disk, the reduce tasks will pull the data through HTTP for
> further processing.****
>
>  ****
>
> Am 21.05.2013 um 19:57 schrieb John Lilley <[EMAIL PROTECTED]>:****
>
> ** **
>
> When MapReduce enters “shuffle” to partition the tuples, I am assuming
> that it writes intermediate data to HDFS.  What replication factor is used
> for those temporary files?****
>
> john****
>
>  ****
>
>  ****
>
> -- ****
>
> Kai Voigt****
>
> [EMAIL PROTECTED]****
>
>  ****
>
> ** **
>
>  ****
>
> ** **
>
+
John Lilley 2013-05-22, 14:57
+
Kun Ling 2013-05-23, 01:50
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB