Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # dev - Hadoop + MPI


Copy link to this message
-
Re: Hadoop + MPI
Ralph Castain 2011-11-21, 23:54
Hi Milind

Glad to hear of the progress - I recall our earlier conversation. I gather you have completed step 1 (wireup) - have you given any thought to the other two steps? Anything I can do to help?

Ralph
On Nov 21, 2011, at 4:47 PM, <[EMAIL PROTECTED]> wrote:

> Hi Ralph,
>
> I spoke with Jeff Squyres  at SC11, and updated him on the status of my
> OpenMPI port on Hadoop Yarn.
>
> To update everyone, I have OpenMPI examples running on #Yarn, although it
> requires some code cleanup and refactoring, however that can be done as a
> later step.
>
> Currently, the MPI processes come up, get submitting client's IP and port
> via environment variables, connect to it, and do a barrier. The result of
> this barrier is that everyone in MPI_COMM_WORLD gets each other's
> endpoints.
>
> I am aiming to submit the patch to hadoop by the end of this month.
>
> I will publish the openmpi patch to github.
>
> (As I mentioned to Jeff, OpenMPI requires a CCLA for accepting
> submissions. That will take some time.)
>
> - Milind
>
> ---
> Milind Bhandarkar
> Greenplum Labs, EMC
> (Disclaimer: Opinions expressed in this email are those of the author, and
> do not necessarily represent the views of any organization, past or
> present, the author might be affiliated with.)
>
>
>
>>
>> I'm willing to do the integration work, but wanted to check first to see
>> if (a) someone in the Hadoop community is already doing so, and (b) if
>> you would be interested in seeing such a capability and willing to accept
>> the code contribution?
>>
>> Establishing MPI support requires the following steps:
>>
>> 1. wireup support. MPI processes need to exchange endpoint info (e.g.,
>> for TCP connections, IP address and port) so that each process knows how
>> to connect to any other process in the application. This is typically
>> done in a collective "modex" operation. There are several ways of doing
>> it - if we proceed, I will outline those in a separate email to solicit
>> your input on the most desirable approach to use.
>>
>> 2. binding support. One can achieve significant performance improvements
>> by binding processes to specific cores, sockets, and/or NUMA regions
>> (regardless of using MPI or not, but certainly important for MPI
>> applications). This requires not only the binding code, but some logic to
>> ensure that one doesn't "overload" specific resources.
>>
>> 3. process mapping. I haven't verified it yet, but I suspect that Hadoop
>> provides each executing instance with an identifier that is unique within
>> that job - e.g., we typically assign an integer "rank" that ranges from 0
>> to the number of instances being executed. This identifier is critical
>> for MPI applications, and the relative placement of processes within a
>> job often dictates overall performance. Thus, we would provide a mapping
>> capability that allows users to specify patterns of process placement for
>> their job - e.g., "place one process on each socket on every node".
>>
>> I have written the code to implement the above support on a number of
>> systems, and don't foresee major problems doing it for Hadoop (though I
>> would welcome a chance to get a brief walk-thru the code from someone).
>> Please let me know if this would be of interest to the Hadoop community.
>>
>> Thanks
>> Ralph Castain
>>
>>
>>
>