-Re: Project announcement: Sailfish (also, looking for colloborators)
Sriram et. al.,
Do you intend this to be a joint project with the Hadoop community or
a technology competitor?
Regrettably, KFS is not a "drop in replacement" for HDFS.
Hypothetically: I have several petabytes of data in an existing HDFS
deployment, which is the norm, and a continuous MapReduce workflow.
How do you propose I, practically, migrate to something like Sailfish
without a major capital expenditure and/or downtime and/or data loss?
However, can the Sailfish I-files implementation be plugged in as an
alternate Shuffle implementation in MRv2 (see MAPREDUCE-3060 and
MAPREDUCE-4049), with necessary additional plumbing for dynamic
adjustment of reduce task population? And the workbuilder could be
part of an alternate MapReduce Application Manager? The I-file concept
could possibly be implemented here in a fairly self contained way. One
could even colocate/embed a KFS filesystem with such an alternate
shuffle, like how MR task temporary space is usually colocated with
Does this seem reasonable in any way?
>> From: Sriram Rao <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Sent: Tuesday, May 8, 2012 10:32 AM
>> Subject: Project announcement: Sailfish (also, looking for colloborators)
>> I'd like to announce the release of a new open source project, Sailfish.
>> Sailfish tries to improve Hadoop-performance, particularly for large-jobs
>> which process TB's of data and run for hours. In building Sailfish, we
>> modify how map-output is handled and transported from map->reduce.
>> The project pages provide more information about the project.
>> We are looking for colloborators who can help get some of the ideas into
>> Apache Hadoop. A possible step forward could be to make "shuffle" phase of
>> Hadoop pluggable.
>> If you are interested in working with us, please get in touch with me.
Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)