Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Re: HBase Types: Explicit Null Support


Copy link to this message
-
Re: HBase Types: Explicit Null Support
Nick Dimiduk 2013-04-05, 02:54
On Thu, Apr 4, 2013 at 7:18 PM, James Taylor <[EMAIL PROTECTED]> wrote:

> Would it make sense to clean up the APIs a bit and post just the type
> system code somewhere to give us something to poke holes at?
>

That could be useful. I've been experimenting with implementations as I
update the spec doc and pushing as I go to
https://github.com/ndimiduk/serialization-play. I can make you a
collaborator or you can host your own repository, as you prefer.

On 04/04/2013 06:49 PM, Nick Dimiduk wrote:
>
>> On Mon, Apr 1, 2013 at 11:33 PM, James Taylor <[EMAIL PROTECTED]
>> >wrote:
>>
>>  Maybe if we can keep nullability separate from the
>>> serialization/deserialization, we can come up with a solution that works?
>>>
>>
>> I think implied null could work, but let's build out the matrix. I see two
>> kinds of types: fixed- and variable-width. These types are used in two
>> scenarios: on their own or as part of a compound type.
>>
>> A fixed-width type used standalone can enfer null from absence of a value.
>> When used in a compound type, absence isn't enough to indicate null unless
>> it's the last value in the sequence. To support a null field in the middle
>> of the compound type, it is forced to explicitly mark the field as null.
>> The only solution I can think of (without sacrificing the full value
>> range,
>> per my original question) is to write the full type width bytes, followed
>> by an isNull byte. Thus, for example, the INT type consumes 4 bytes when
>> serialized stand-alone, but 5 bytes when composed.
>>
>> James, how does Phoenix handle a null fixed-width rowkey component? I
>> don't
>> see that implemented in PDataType enum.
>>
>> Variable-width used standalone are simple enough because HBase handles
>> arbitrary length byte[]'s everywhere. Variable-width in composite is a
>> problem. Phoenix forces these value to only appear as the last position in
>> the composite, as I understand it. Orderly provides explicit null and
>> termination bytes by taking advantage of a feature of UTF-8 encoding.
>> Support for bytes is equally ugly (but clever) in that byte digits are
>> encoded in BCD. Both of these approaches bloat slightly the serialized
>> representation over the natural representation, but they allow the
>> variable-length types to be used anywhere within the compound type. As an
>> added bonus regarding code maintainability, their serialization entirely
>> self-contained within the type. That's in contrast to the fixed-width type
>> implementation described above, where null is explicitly encoded by the
>> compound type.
>>
>> My opinion is the computational and storage overhead imposed by Orderly's
>> implementation are worth the trade-off in flexibility in user consumption.
>> Correct me if i'm wrong James, but you're saying, from your experience
>> with
>> Phoenix, users are willing to work within that constraint?
>>
>> Thanks,
>> Nick
>>
>> On 04/01/2013 11:29 PM, Jesse Yates wrote:
>>
>>   Actually, that isn't all that far-fetched of a format Matt - pretty
>> common
>>
>>>  anytime anyone wants to do sortable lat/long (*cough* three letter
>>>> agencies
>>>> cough*).
>>>>
>>>> Wouldn't we get the same by providing a simple set of libraries (ala
>>>> orderly + other HBase useful things) and then still give access to the
>>>> underlying byte array? Perhaps a nullable key type in that lib makes
>>>> sense
>>>> if lots of people need it and it would be nice to have standard
>>>> libraries
>>>> so tools could interop much more easily.
>>>> -------------------
>>>> Jesse Yates
>>>> @jesse_yates
>>>> jyates.github.com
>>>>
>>>>
>>>> On Mon, Apr 1, 2013 at 11:17 PM, Matt Corgan <[EMAIL PROTECTED]>
>>>> wrote:
>>>>
>>>>   Ah, I didn't even realize sql allowed null key parts.  Maybe a goal of
>>>>
>>>>> the
>>>>> interfaces should be to provide first-class support for custom user
>>>>> types
>>>>> in addition to the standard ones included.  Part of the power of
>>>>> hbase's
>>>>> plain byte[] keys is that users can concoct the perfect key for their