Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Shuffle phase replication factor


+
John Lilley 2013-05-21, 18:57
+
Kai Voigt 2013-05-21, 18:58
+
John Lilley 2013-05-22, 14:33
+
Shahab Yunus 2013-05-22, 14:37
Copy link to this message
-
Re: Shuffle phase replication factor
There are properties/configuration to control the no. of copying threads
for copy.
tasktracker.http.threads=40
Thanks,
Rahul
On Wed, May 22, 2013 at 8:16 PM, John Lilley <[EMAIL PROTECTED]>wrote:

>  This brings up another nagging question I’ve had for some time.  Between
> HDFS and shuffle, there seems to be the potential for “every node
> connecting to every other node” via TCP.  Are there explicit mechanisms in
> place to manage or limit simultaneous connections?  Is the protocol simply
> robust enough to allow a server-side to disconnect at any time to free up
> slots and the client-side will retry the request?****
>
> Thanks****
>
> john****
>
> ** **
>
> *From:* Shahab Yunus [mailto:[EMAIL PROTECTED]]
> *Sent:* Wednesday, May 22, 2013 8:38 AM
>
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Shuffle phase replication factor****
>
> ** **
>
> As mentioned by Bertrand, Hadoop, The Definitive Guide, is well... really
> definitive :) place to start. It is pretty thorough for starts and once you
> are gone through it, the code will start making more sense too.****
>
> ** **
>
> Regards,****
>
> Shahab****
>
> ** **
>
> On Wed, May 22, 2013 at 10:33 AM, John Lilley <[EMAIL PROTECTED]>
> wrote:****
>
> Oh I see.  Does this mean there is another service and TCP listen port for
> this purpose?****
>
> Thanks for your indulgence… I would really like to read more about this
> without bothering the group but not sure where to start to learn these
> internals other than the code.****
>
> john****
>
>  ****
>
> *From:* Kai Voigt [mailto:[EMAIL PROTECTED]]
> *Sent:* Tuesday, May 21, 2013 12:59 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Shuffle phase replication factor****
>
>  ****
>
> The map output doesn't get written to HDFS. The map task writes its output
> to its local disk, the reduce tasks will pull the data through HTTP for
> further processing.****
>
>  ****
>
> Am 21.05.2013 um 19:57 schrieb John Lilley <[EMAIL PROTECTED]>:****
>
> ** **
>
> When MapReduce enters “shuffle” to partition the tuples, I am assuming
> that it writes intermediate data to HDFS.  What replication factor is used
> for those temporary files?****
>
> john****
>
>  ****
>
>  ****
>
> -- ****
>
> Kai Voigt****
>
> [EMAIL PROTECTED]****
>
>  ****
>
> ** **
>
>  ****
>
> ** **
>
+
John Lilley 2013-05-22, 14:57
+
Kun Ling 2013-05-23, 01:50