HBase, mail # dev - Re: HBase Types: Explicit Null Support

Re: HBase Types: Explicit Null Support
Nick Dimiduk 2013-04-05, 01:49
On Mon, Apr 1, 2013 at 11:33 PM, James Taylor <[EMAIL PROTECTED]>wrote:

> Maybe if we can keep nullability separate from the
> serialization/deserialization, we can come up with a solution that works?
I think implied null could work, but let's build out the matrix. I see two
kinds of types: fixed- and variable-width. These types are used in two
scenarios: on their own or as part of a compound type.

A fixed-width type used standalone can enfer null from absence of a value.
When used in a compound type, absence isn't enough to indicate null unless
it's the last value in the sequence. To support a null field in the middle
of the compound type, it is forced to explicitly mark the field as null.
The only solution I can think of (without sacrificing the full value range,
per my original question) is to write the full type width bytes, followed
by an isNull byte. Thus, for example, the INT type consumes 4 bytes when
serialized stand-alone, but 5 bytes when composed.

James, how does Phoenix handle a null fixed-width rowkey component? I don't
see that implemented in PDataType enum.

Variable-width used standalone are simple enough because HBase handles
arbitrary length byte[]'s everywhere. Variable-width in composite is a
problem. Phoenix forces these value to only appear as the last position in
the composite, as I understand it. Orderly provides explicit null and
termination bytes by taking advantage of a feature of UTF-8 encoding.
Support for bytes is equally ugly (but clever) in that byte digits are
encoded in BCD. Both of these approaches bloat slightly the serialized
representation over the natural representation, but they allow the
variable-length types to be used anywhere within the compound type. As an
added bonus regarding code maintainability, their serialization entirely
self-contained within the type. That's in contrast to the fixed-width type
implementation described above, where null is explicitly encoded by the
compound type.

My opinion is the computational and storage overhead imposed by Orderly's
implementation are worth the trade-off in flexibility in user consumption.
Correct me if i'm wrong James, but you're saying, from your experience with
Phoenix, users are willing to work within that constraint?


