Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - HBase Types: Explicit Null Support


+
Nick Dimiduk 2013-04-01, 18:00
+
Doug Meil 2013-04-01, 18:41
+
Matt Corgan 2013-04-01, 19:26
+
Nick Dimiduk 2013-04-01, 20:32
+
James Taylor 2013-04-01, 23:31
Copy link to this message
-
Re: HBase Types: Explicit Null Support
Nick Dimiduk 2013-04-01, 23:41
On Mon, Apr 1, 2013 at 4:31 PM, James Taylor <[EMAIL PROTECTED]> wrote:

> From the SQL perspective, handling null is important.
>From your perspective, it is critical to support NULLs, even at the expense
of fixed-width encodings at all or supporting representation of a full
range of values. That is, you'd rather be able to represent NULL than -2^31?

On 04/01/2013 01:32 PM, Nick Dimiduk wrote:
>
>> Thanks for the thoughtful response (and code!).
>>
>> I'm thinking I will press forward with a base implementation that does not
>> support nulls. The idea is to provide an extensible set of interfaces, so
>> I
>> think this will not box us into a corner later. That is, a mirroring
>> package could be implemented that supports null values and accepts
>> the relevant trade-offs.
>>
>> Thanks,
>> Nick
>>
>> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:
>>
>>  I spent some time this weekend extracting bits of our serialization code
>>> to
>>> a public github repo at http://github.com/hotpads/**data-tools<http://github.com/hotpads/data-tools>
>>> .
>>>   Contributions are welcome - i'm sure we all have this stuff laying
>>> around.
>>>
>>> You can see I've bumped into the NULL problem in a few places:
>>> *
>>>
>>> https://github.com/hotpads/**data-tools/blob/master/src/**
>>> main/java/com/hotpads/data/**primitive/lists/LongArrayList.**java<https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java>
>>> *
>>>
>>> https://github.com/hotpads/**data-tools/blob/master/src/**
>>> main/java/com/hotpads/data/**types/floats/DoubleByteTool.**java<https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java>
>>>
>>> Looking back, I think my latest opinion on the topic is to reject
>>> nullability as the rule since it can cause unexpected behavior and
>>> confusion.  It's cleaner to provide a wrapper class (so both
>>> LongArrayList
>>> plus NullableLongArrayList) that explicitly defines the behavior, and
>>> costs
>>> a little more in performance.  If the user can't find a pre-made wrapper
>>> class, it's not very difficult for each user to provide their own
>>> interpretation of null and check for it themselves.
>>>
>>> If you reject nullability, the question becomes what to do in situations
>>> where you're implementing existing interfaces that accept nullable
>>> params.
>>>   The LongArrayList above implements List<Long> which requires an
>>> add(Long)
>>> method.  In the above implementation I chose to swap nulls with
>>> Long.MIN_VALUE, however I'm now thinking it best to force the user to
>>> make
>>> that swap and then throw IllegalArgumentException if they pass null.
>>>
>>>
>>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <
>>> [EMAIL PROTECTED]
>>>
>>>> wrote:
>>>> HmmmŠ good question.
>>>>
>>>> I think that fixed width support is important for a great many rowkey
>>>> constructs cases, so I'd rather see something like losing MIN_VALUE and
>>>> keeping fixed width.
>>>>
>>>>
>>>>
>>>>
>>>> On 4/1/13 2:00 PM, "Nick Dimiduk" <[EMAIL PROTECTED]> wrote:
>>>>
>>>>  Heya,
>>>>>
>>>>> Thinking about data types and serialization. I think null support is an
>>>>> important characteristic for the serialized representations, especially
>>>>> when considering the compound type. However, doing so in directly
>>>>> incompatible with fixed-width representations for numerics. For
>>>>>
>>>> instance,
>>>
>>>> if we want to have a fixed-width signed long stored on 8-bytes, where do
>>>>> you put null? float and double types can cheat a little by folding
>>>>> negative
>>>>> and positive NaN's into a single representation (this isn't strictly
>>>>> correct!), leaving a place to represent null. In the long example case,
>>>>> the
>>>>> obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one.
>>>>> This
>>>>> will allocate an additional encoding which can be used for null. My
>>>>> experience working with scientific data, however, makes me wince at the
+
Nick Dimiduk 2013-04-02, 02:26
+
Enis Söztutar 2013-04-02, 03:38
+
Matt Corgan 2013-04-02, 06:17
+
Michel Segel 2013-04-02, 02:40
+
James Taylor 2013-04-01, 23:49
+
Matt Corgan 2013-04-02, 00:07
+
Nick Dimiduk 2013-04-05, 00:34