Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - [DISCUSS] Hadoop Security Release off Yahoo! patchset


Copy link to this message
-
Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset
Eric Baldeschwieler 2011-01-17, 23:29
Hi Stack,

I feel your pain.  We're running a 700 node HBASE cluster containing a HUGE collections of all web pages.  Both versions of append were started by engineers working at yahoo and we've put A LOT of investment into both.  I really, really want to see the append issue solved for HBASE!!  

My point is simply that we need to separate our concerns.  I would 300% support a community of folks building a 0.20 derived version of hadoop with append and we know that any new release post 0.20 will contain an append solution.  This branch is more backwards facing.  We are simply trying to share our last two years of 0.20 experience with the community, so that a) folks can use it if they find value in it, b) this work can be merged into future hadoop releases (that will have append).

We want to share what we have tested, since we believe that the testing is a good chunk of our contribution.

Thanks,

E14

On Jan 16, 2011, at 2:57 PM, Stack wrote:

> On Fri, Jan 14, 2011 at 10:25 AM, Eric Baldeschwieler
> <[EMAIL PROTECTED]> wrote:
>> 2) append is hard. It is so hard we rewrote the entire write pipeline (5 person-years work) in trunk after giving up on the codeline you are suggesting we merge in. That work is what distinguishes all post 20 releases from 20 releases in my mind. I dont trust the 20 append code line. We've been hurt badly by it.  We did the rewrite only after losing a bunch of production data a bunch of times with the previous code line.  I think the various 20 append patch lines may be fine for specialized hbase clusters, but they doesn't have the rigor behind them to bet your business in them.
>>
>
> Eric:
>
> A few comments on the above:
>
> + Append has had a bunch of work done on it since the Y! dataloss of a
> few years ago on an ancestor of the branch-0.20-append codebase (IIRC
> the issue you refer to in particular -- the 'dataloss' because
> partially written blocks were done up in tmp dirs, and on cluster
> restart, tmp data was cleared -- has been fixed in
> branch-0.20.append).
> + You may not trust 0.20-append (or its close cousin over in CDH) but
> a bunch of HBasers do. On the one hand, we have little choice.  Until
> the *new* append becomes available in a stable Hadoop the HBase
> project has had to sustain itself (What you think?, 3-6 months before
> we see 0.22?  HBase project can't hold its breath that long).  On
> other hand, the branch-0.20-append work has been carried out by lads
> (and lasses!) who know their HDFS.  Its true that it will not have
> been tested with Y! rigor but near-derivatives -- CDH or the FB
> branches -- already do HDFS-200-based append in production.
>
> St.Ack
> P.S. Don't get me wrong.  HBase is looking forward to *new* append.
> We just need something to suck on meantime.