Ted Yu 2013-04-01, 21:38
Ted Yu 2013-04-02, 00:10
James Taylor 2013-04-02, 00:39
Jesse Yates 2013-04-02, 06:29
James Taylor 2013-04-02, 06:33
Nick Dimiduk 2013-04-05, 01:49
James Taylor 2013-04-05, 02:18
-Re: HBase Types: Explicit Null Support
Nick Dimiduk 2013-04-05, 02:54
On Thu, Apr 4, 2013 at 7:18 PM, James Taylor <[EMAIL PROTECTED]> wrote:
> Would it make sense to clean up the APIs a bit and post just the type
> system code somewhere to give us something to poke holes at?
That could be useful. I've been experimenting with implementations as I
update the spec doc and pushing as I go to
https://github.com/ndimiduk/serialization-play. I can make you a
collaborator or you can host your own repository, as you prefer.
On 04/04/2013 06:49 PM, Nick Dimiduk wrote:
>> On Mon, Apr 1, 2013 at 11:33 PM, James Taylor <[EMAIL PROTECTED]
>> Maybe if we can keep nullability separate from the
>>> serialization/deserialization, we can come up with a solution that works?
>> I think implied null could work, but let's build out the matrix. I see two
>> kinds of types: fixed- and variable-width. These types are used in two
>> scenarios: on their own or as part of a compound type.
>> A fixed-width type used standalone can enfer null from absence of a value.
>> When used in a compound type, absence isn't enough to indicate null unless
>> it's the last value in the sequence. To support a null field in the middle
>> of the compound type, it is forced to explicitly mark the field as null.
>> The only solution I can think of (without sacrificing the full value
>> per my original question) is to write the full type width bytes, followed
>> by an isNull byte. Thus, for example, the INT type consumes 4 bytes when
>> serialized stand-alone, but 5 bytes when composed.
>> James, how does Phoenix handle a null fixed-width rowkey component? I
>> see that implemented in PDataType enum.
>> Variable-width used standalone are simple enough because HBase handles
>> arbitrary length byte's everywhere. Variable-width in composite is a
>> problem. Phoenix forces these value to only appear as the last position in
>> the composite, as I understand it. Orderly provides explicit null and
>> termination bytes by taking advantage of a feature of UTF-8 encoding.
>> Support for bytes is equally ugly (but clever) in that byte digits are
>> encoded in BCD. Both of these approaches bloat slightly the serialized
>> representation over the natural representation, but they allow the
>> variable-length types to be used anywhere within the compound type. As an
>> added bonus regarding code maintainability, their serialization entirely
>> self-contained within the type. That's in contrast to the fixed-width type
>> implementation described above, where null is explicitly encoded by the
>> compound type.
>> My opinion is the computational and storage overhead imposed by Orderly's
>> implementation are worth the trade-off in flexibility in user consumption.
>> Correct me if i'm wrong James, but you're saying, from your experience
>> Phoenix, users are willing to work within that constraint?
>> On 04/01/2013 11:29 PM, Jesse Yates wrote:
>> Actually, that isn't all that far-fetched of a format Matt - pretty
>>> anytime anyone wants to do sortable lat/long (*cough* three letter
>>>> Wouldn't we get the same by providing a simple set of libraries (ala
>>>> orderly + other HBase useful things) and then still give access to the
>>>> underlying byte array? Perhaps a nullable key type in that lib makes
>>>> if lots of people need it and it would be nice to have standard
>>>> so tools could interop much more easily.
>>>> Jesse Yates
>>>> On Mon, Apr 1, 2013 at 11:17 PM, Matt Corgan <[EMAIL PROTECTED]>
>>>> Ah, I didn't even realize sql allowed null key parts. Maybe a goal of
>>>>> interfaces should be to provide first-class support for custom user
>>>>> in addition to the standard ones included. Part of the power of
>>>>> plain byte keys is that users can concoct the perfect key for their