Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> Project announcement: Sailfish (also, looking for colloborators)


Copy link to this message
-
Re: Project announcement: Sailfish (also, looking for colloborators)
Sriram et. al.,

Do you intend this to be a joint project with the Hadoop community or
a technology competitor?

Regrettably, KFS is not a "drop in replacement" for HDFS.
Hypothetically: I have several petabytes of data in an existing HDFS
deployment, which is the norm, and a continuous MapReduce workflow.
How do you propose I, practically, migrate to something like Sailfish
without a major capital expenditure and/or downtime and/or data loss?

However, can the Sailfish I-files implementation be plugged in as an
alternate Shuffle implementation in MRv2 (see MAPREDUCE-3060 and
MAPREDUCE-4049), with necessary additional plumbing for dynamic
adjustment of reduce task population? And the workbuilder could be
part of an alternate MapReduce Application Manager? The I-file concept
could possibly be implemented here in a fairly self contained way. One
could even colocate/embed a KFS filesystem with such an alternate
shuffle, like how MR task temporary space is usually colocated with
HDFS storage.

Does this seem reasonable in any way?

Best regards,

   - Andy

>>  From: Sriram Rao <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Sent: Tuesday, May 8, 2012 10:32 AM
>> Subject: Project announcement: Sailfish (also, looking for colloborators)
>>
>> Hi,
>>
>> I'd like to announce the release of a new open source project, Sailfish.
>>
>> http://code.google.com/p/sailfish/
>>
>> Sailfish tries to improve Hadoop-performance, particularly for large-jobs
>> which process TB's of data and run for hours.  In building Sailfish, we
>> modify how map-output is handled and transported from map->reduce.
>>
>> The project pages provide more information about the project.
>>
>> We are looking for colloborators who can help get some of the ideas into
>> Apache Hadoop. A possible step forward could be to make "shuffle" phase of
>> Hadoop pluggable.
>>
>> If you are interested in working with us, please get in touch with me.
>>
>> Sriram
>

--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)