Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Pull instant schema updating out?


Copy link to this message
-
Re: Pull instant schema updating out?
Nicolas Spiegelberg 2012-04-03, 17:34
We're using a variant of the Online schema update in our 89 production.
There are significant differences because of the master rewrite, so I
can't speak about the stability on trunk.  We don't run with online
splitting, so that's also a large variant from the use cases some of you
are trying to support.  Overall, I'll echo my original thoughts when
HBASE-1730 was submitted that I think have played out: Online Schema
Changes have looser requirements that allow us to create a solid solution
with less effort.  Optimally, production schema changes shouldn't happen
very often and will be actively monitored when they do.  Master failure
just means that you need to retry the operation, not that data loss or
region unavailability occurred.  Unless this is a critical long-term
feature for your production environment, the persistence is overly
elaborate and will cause more problems with downtime/analysis/support than
it will solve with the minor inconvenience of checking back in 5 min to
make sure your change completely rolled out.  I think there are a million
other features of higher benefit and lower support effort.

I think we should stay at 1 version, Online Schema Update.  It's the
simplest and most tested.  If we go with the more complicated "Instant
Schema Alter", I should be owned by a group of full time devs who think
it's the one of the flagship features of their production system, because
it's a major undertaking and deserves that amount of effort.

On 4/3/12 12:49 PM, "Stack" <[EMAIL PROTECTED]> wrote:

>On Tue, Apr 3, 2012 at 9:23 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
>> Has any change been done in the past half year on "Online schema
>>update" ?
>>
>
>Minor if any.
>
>It does have the advantage that it is at least being used.
>
>
>> When we make such an important decision, we should evaluate various
>> factors. "Instant schema alter" has better design: it endures master
>> failover.
>>
>> If the following tests can be written for "Online schema update" and
>>pass,
>> I would vote for it:
>>
>>src/test/java/org/apache/hadoop/hbase/client/TestInstantSchemaChange.java
>>
>>src/test/java/org/apache/hadoop/hbase/client/TestInstantSchemaChangeFailo
>>ver.java
>>
>>src/test/java/org/apache/hadoop/hbase/client/TestInstantSchemaChangeSplit
>>.java
>>
>> Otherwise I am -1 on pulling "Instant schema alter"
>>
>> There is more work to be done: reduce the noise posed by MonitoredTask,
>> throttling, etc. But these are not difficult tasks.
>>
>
>It has 'better' design but it is fallible to failure still and far
>from perfect in that it persists by writing feature-particular znodes
>w/ feature-particular custom handlers sprinkled about the code base.
>We should be able to do better.
>
>All of your above argument is posited on more work being done whether
>tests for one implementation or completion of a feature no one has
>used in almost a year.   I would suggest that you don't have much of a
>case if it is predicated on the work of others.  Sign up to fix it or
>lets pull it; even then I'd say pull it till its fixed rather than let
>a broke implementation go out in 0.94.
>
>St.Ack