Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> Re: Sailfish


Hi Sriram,

>> The I-file concept could possibly be implemented here in a fairly self contained way. One
>> could even colocate/embed a KFS filesystem with such an alternate
>> shuffle, like how MR task temporary space is usually colocated with
>> HDFS storage.

>  Exactly.

>> Does this seem reasonable in any way?

> Great. Where do go from here?  How do we get a colloborative effort going? 
Sounds like a JIRA issue should be opened, the approach briefly described, and the first implementation attempt made.  Then iterate.

I look forward to seeing this! :)

Otis
--

Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm 

>________________________________
> From: Sriram Rao <[EMAIL PROTECTED]>
>To: [EMAIL PROTECTED]
>Sent: Tuesday, May 8, 2012 6:48 PM
>Subject: Re: Sailfish
>
>Dear Andy,
>
>> From: Andrew Purtell <[EMAIL PROTECTED]>
>> ...
>
>> Do you intend this to be a joint project with the Hadoop community or
>> a technology competitor?
>
>As I had said in my email, we are looking for folks to colloborate
>with us to help get us integrated with Hadoop.  So, to be explicitly
>clear, we are intending for this to be a joint project with the
>community.
>
>> Regrettably, KFS is not a "drop in replacement" for HDFS.
>> Hypothetically: I have several petabytes of data in an existing HDFS
>> deployment, which is the norm, and a continuous MapReduce workflow.
>> How do you propose I, practically, migrate to something like Sailfish
>> without a major capital expenditure and/or downtime and/or data loss?
>
>Well, we are not asking for KFS to replace HDFS.  One path you could
>take is to experiment with Sailfish---use KFS just for the
>intermediate data and HDFS for everything else.  There is no major
>capex :).  While you get comfy with pushing intermediate data into a
>DFS, we get the ideas added to HDFS.  This simplifies deployment
>considerations.
>
>> However, can the Sailfish I-files implementation be plugged in as an
>> alternate Shuffle implementation in MRv2 (see MAPREDUCE-3060 and
>> MAPREDUCE-4049),
>
>This'd be great!
>
>> with necessary additional plumbing for dynamic
>> adjustment of reduce task population? And the workbuilder could be
>> part of an alternate MapReduce Application Manager?
>
>It should be part of the AM.  (Currently, with our implementation in
>Hadoop-0.20.2, the workbuilder serves the role of an AM).
>
>> The I-file concept could possibly be implemented here in a fairly self contained way. One
>> could even colocate/embed a KFS filesystem with such an alternate
>> shuffle, like how MR task temporary space is usually colocated with
>> HDFS storage.
>
>Exactly.
>
>> Does this seem reasonable in any way?
>
>Great. Where do go from here?  How do we get a colloborative effort going?
>
>Best,
>
>Sriram
>
>>>  From: Sriram Rao <[EMAIL PROTECTED]>
>>> To: [EMAIL PROTECTED]
>>> Sent: Tuesday, May 8, 2012 10:32 AM
>>> Subject: Project announcement: Sailfish (also, looking for colloborators)
>>>
>>> Hi,
>>>
>>> I'd like to announce the release of a new open source project, Sailfish.
>>>
>>> http://code.google.com/p/sailfish/
>>>
>>> Sailfish tries to improve Hadoop-performance, particularly for large-jobs
>>> which process TB's of data and run for hours.  In building Sailfish, we
>>> modify how map-output is handled and transported from map->reduce.
>>>
>>> The project pages provide more information about the project.
>>>
>>> We are looking for colloborators who can help get some of the ideas into
>>> Apache Hadoop. A possible step forward could be to make "shuffle" phase of
>>> Hadoop pluggable.
>>>
>>> If you are interested in working with us, please get in touch with me.
>>>
>>> Sriram
>>
>
>
>
>--
>Best regards,
>
>   - Andy
>
>Problems worthy of attack prove their worth by hitting back. - Piet
>Hein (via Tom White)
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB