-Re: Hadoop-on-demand and torque
Ralph Castain 2012-05-21, 14:36
Not quite yet, though we are working on it (some descriptive stuff is around, but needs to be consolidated). Several of us started working together a couple of months ago to support the MapReduce programming model on HPC clusters using Open MPI as the platform. In working with our customers and OMPI's wide community of users, we found that people were interested in this capability, wanted to integrate MPI support into their MapReduce jobs, and didn't want to migrate their clusters to YARN for various reasons.
We have released initial versions of two new tools in the OMPI developer's trunk, scheduled for inclusion in the upcoming 1.7.0 release:
1. "mr+" - executes the MapReduce programming paradigm. Currently, we only support streaming data, though we will extend that support shortly. All HPC environments (rsh, SLURM, Torque, Alps, LSF, Windows, etc.) are supported. Both mappers and reducers can utilize MPI (independently or in combination) if they so choose. Mappers and reducers can be written in any of the typical HPC languages (C, C++, and Fortran) as well as Java (note: OMPI now comes with Java MPI bindings).
2. "hdfsalloc" - takes a list of files and obtains a resource allocation for the nodes upon which those files reside. SLURM and Moab/Maui are currently supported, with Gridengine coming soon.
There will be a public announcement of this in the near future, and we expect to integrate the Hadoop 1.0 and Hadoop 2.0 MR classes over the next couple of months. By the end of this summer, we should have a full-featured public release.
On May 20, 2012, at 2:10 PM, Brian Bockelman wrote:
> Hi Ralph,
> I admit - I've only been half-following the OpenMPI progress. Do you have a technical write-up of what has been done?
> On May 20, 2012, at 9:31 AM, Ralph Castain wrote:
>> FWIW: Open MPI now has an initial cut at "MR+" that runs map-reduce under any HPC environment. We don't have the Java integration yet to support the Hadoop MR class, but you can write a mapper/reducer and execute that programming paradigm. We plan to integrate the Hadoop MR class soon.
>> If you already have that integration, we'd love to help port it over. We already have the MPI support completed, so any mapper/reducer could use it.
>> On May 20, 2012, at 7:12 AM, Pierre Antoine DuBoDeNa wrote:
>>> We run similar infrastructure in a university project.. we plan to install
>>> hadoop.. and looking for "alternatives" based on hadoop in case the pure
>>> hadoop is not working as expected.
>>> Keep us updated on the code release.
>>> 2012/5/20 Stijn De Weirdt <[EMAIL PROTECTED]>
>>>> hi all,
>>>> i'm part of an HPC group of a university, and we have some users that are
>>>> interested in Hadoop to see if it can be useful in their research and we
>>>> also have researchers that are using hadoop already on their own
>>>> infrastructure, but that is is not enough reason for us to start with
>>>> dedicated dedicated Hadoop infrastructure (we are now only running torque
>>>> based clusters with and without shared storage; setting up and properly
>>>> maintaining Hadoop infrastructure requires quite some understanding of new
>>>> to be able to support these needs we wanted to do just this: use current
>>>> HPC infrastructure to make private hadoop clusters so people can do some
>>>> work. if we attract enough interest, we will probably setup dedicated
>>>> infrastructure, but by that time we (the admins) will also have a better
>>>> understanding of what is required.
>>>> so we used to look at HOD for testing/running hadoop on existing
>>>> infrastructure (never really looked at myhadoop though).
>>>> but (imho) the current HOD code base is not in such a good state. we did
>>>> some work to get it working and added some features, to come to the
>>>> conclusion that it was not sufficient (and not maintainable).
>>>> so we wrote something from scratch with same functionality as HOD, and