Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> Hadoop + MPI


Copy link to this message
-
Re: Hadoop + MPI
FWIW: I can commit the OMPI part of your patch for you. The CCLA is intended to ensure that people realize the need to protect OMPI from "infection" due to code based on other licenses such as GPL. For people only offering a single patch, it often is too big a burden to get corporate approval of the legal document.

So as long as someone (e.g., me) who already is operating under the CCLA is willing to review and commit the patch, and the patch isn't too huge, we can absorb it that way. I expect your patch is just a new ess component, and I'm happy to do the review and commit it on your behalf, if that is acceptable to you.
On Nov 21, 2011, at 5:04 PM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote:

> Ralph,
>
> Yes, I have completed the first step, although I would really like that
> code to be part of the MPI Application Master (Chris Douglas suggested a
> way to do this at ApacheCon).
>
> Regarding the remaining steps, I have been following discussions on the
> open mpi mailing lists, and reading code for hwloc.
>
> If you are making a trip to Cisco HQ sometime soon, I would like to have a
> face-to-face about hwloc. I have so far avoided to use a native task
> controller for spawning MPI jobs, but given the lack of support for
> binding in Java, it looks like I will have to bite the bullet.
>
> - milind
>
> ---
> Milind Bhandarkar
> Greenplum Labs, EMC
> (Disclaimer: Opinions expressed in this email are those of the author, and
> do not necessarily represent the views of any organization, past or
> present, the author might be affiliated with.)
>
>
>
> On 11/21/11 3:54 PM, "Ralph Castain" <[EMAIL PROTECTED]> wrote:
>
>> Hi Milind
>>
>> Glad to hear of the progress - I recall our earlier conversation. I
>> gather you have completed step 1 (wireup) - have you given any thought to
>> the other two steps? Anything I can do to help?
>>
>> Ralph
>>
>>
>> On Nov 21, 2011, at 4:47 PM, <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Ralph,
>>>
>>> I spoke with Jeff Squyres  at SC11, and updated him on the status of my
>>> OpenMPI port on Hadoop Yarn.
>>>
>>> To update everyone, I have OpenMPI examples running on #Yarn, although
>>> it
>>> requires some code cleanup and refactoring, however that can be done as
>>> a
>>> later step.
>>>
>>> Currently, the MPI processes come up, get submitting client's IP and
>>> port
>>> via environment variables, connect to it, and do a barrier. The result
>>> of
>>> this barrier is that everyone in MPI_COMM_WORLD gets each other's
>>> endpoints.
>>>
>>> I am aiming to submit the patch to hadoop by the end of this month.
>>>
>>> I will publish the openmpi patch to github.
>>>
>>> (As I mentioned to Jeff, OpenMPI requires a CCLA for accepting
>>> submissions. That will take some time.)
>>>
>>> - Milind
>>>
>>> ---
>>> Milind Bhandarkar
>>> Greenplum Labs, EMC
>>> (Disclaimer: Opinions expressed in this email are those of the author,
>>> and
>>> do not necessarily represent the views of any organization, past or
>>> present, the author might be affiliated with.)
>>>
>>>
>>>
>>>>
>>>> I'm willing to do the integration work, but wanted to check first to
>>>> see
>>>> if (a) someone in the Hadoop community is already doing so, and (b) if
>>>> you would be interested in seeing such a capability and willing to
>>>> accept
>>>> the code contribution?
>>>>
>>>> Establishing MPI support requires the following steps:
>>>>
>>>> 1. wireup support. MPI processes need to exchange endpoint info (e.g.,
>>>> for TCP connections, IP address and port) so that each process knows
>>>> how
>>>> to connect to any other process in the application. This is typically
>>>> done in a collective "modex" operation. There are several ways of doing
>>>> it - if we proceed, I will outline those in a separate email to solicit
>>>> your input on the most desirable approach to use.
>>>>
>>>> 2. binding support. One can achieve significant performance
>>>> improvements
>>>> by binding processes to specific cores, sockets, and/or NUMA regions