Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> VOTE: HDFS-347 merge


Copy link to this message
-
Re: VOTE: HDFS-347 merge
The throughput on this thread is too high.

Nicholas/Suresh: do either of you disagree with the general approach
in HDFS-347? Holding an inevitable branch open causes a lot of pain
(Suresh, you endured much of that personally with security, which had
a lot of follow-on work). While it makes sense to block HDFS-347 from
2.x before all concerns are addressed, are most objections so
fundamental that the merge to trunk should be prevented (except for
HDFS-2246)? Is there related development that will be impacted? Colin
won't disappear after this is committed, so if the answers to these
questions are "no", then let's move forward.

Todd: Yes, the idea that the project only allows optimizations that
work on all platforms is idiotic, which is why nobody has suggested
it. Given that HDFS-347 is a strictly better approach, once committed,
there will be ample motivation to add support for other OSes and
remove HDFS-2246 entirely. Nobody is confused about this. There's
ample precedent for retaining obscure, clumsy features as a temporary
stop-gap (e.g., service plugins, opaque blobs of bytes in Tasks,
configurable combiner semantics). What's the virtue of insisting on
removing this? Unless there was a lot of follow-on work, HDFS-2246
doesn't look like a lot of code... -C

On Wed, Feb 20, 2013 at 3:40 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> On Wed, Feb 20, 2013 at 3:31 PM, Suresh Srinivas <[EMAIL PROTECTED]> wrote:
>>> Given that this is an optimization, and we have a ton of optimizations
>>> which don't yet run on Windows, I don't think that should be
>>> considered. Additionally, the Windows support has not yet been merged,
>>> nor is it in any release, so this isn't a regression.
>>>
>>
>> This is a critical functionality for HBase peformance and an optimization
>> we consider
>> very important to have.
>
> Too bad it doesn't work in any of the normal installations... none of
> the packages for Hadoop would allow it to work, given that the data
> directories will be owned by HDFS and not world readable, and
> tasks/HBase would run as an "hbase" user, which wouldn't have direct
> access to the block files.
>
>>>
>>> I would be happy to review an addition to the HDFS-347 branch which
>>> addresses this issue. But I don't think we should be maintaining two
>>> codepaths for the sake of an optimization on a platform which is not
>>> yet officially supported on trunk, especially when the old code path
>>> is _insecure_ and thus unusable in most environments.
>>
>>
>> I have to disagree. No where in the jira or the design it is explicitly
>> stated that
>> the old short circuit functionality is being removed. My assumption has been
>> that it will not be removed.
>
> I've tried this avenue in the past on other insecurities which were
> fixed. Sorry if you were depending on insecure behavior. The project
> should move on and not have 3+ ways of implementing the same thing.
>
>> As regards "officially supported", we have been doing
>> windows development for
>> more than a year. In fact branch-1-win is being used by a lot users. Given
>> merging it to branch-1 requires first making it available in trunk, we have
>> been doing
>> a lot of work in branch-trunk-win. It is almost ready to be merged as well.
>>
>> I am -1 on removing existing short circuit until an alternative short
>> circuit similar
>> to HDFS-347 on all the platforms.
>
> Great -- are you committed to building this equivalent feature for
> Windows, then? On what timeline? From my viewpoint, Windows isn't a
> supported platform *right now*, so vetoing based on it seems
> meritless.
>
> BTW, the posix_fadvise based readahead is an important optimization
> for many workloads, too, but from what I can tell looking at the
> Windows branch, it doesn't support it. There are other places in the
> Windows branch where performance is going to be worse - eg disabling
> the pipelined crc32c implementation will be a 2-3x hit on that code
> path.
>
> No one has voted to merge Windows support, and if merging Windows

On Wed, Feb 20, 2013 at 3:40 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB