Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Types: Explicit Null Support


Copy link to this message
-
Re: HBase Types: Explicit Null Support
Thanks for the thoughtful response (and code!).

I'm thinking I will press forward with a base implementation that does not
support nulls. The idea is to provide an extensible set of interfaces, so I
think this will not box us into a corner later. That is, a mirroring
package could be implemented that supports null values and accepts
the relevant trade-offs.

Thanks,
Nick

On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:

> I spent some time this weekend extracting bits of our serialization code to
> a public github repo at http://github.com/hotpads/data-tools.
>  Contributions are welcome - i'm sure we all have this stuff laying around.
>
> You can see I've bumped into the NULL problem in a few places:
> *
>
> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java
> *
>
> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java
>
> Looking back, I think my latest opinion on the topic is to reject
> nullability as the rule since it can cause unexpected behavior and
> confusion.  It's cleaner to provide a wrapper class (so both LongArrayList
> plus NullableLongArrayList) that explicitly defines the behavior, and costs
> a little more in performance.  If the user can't find a pre-made wrapper
> class, it's not very difficult for each user to provide their own
> interpretation of null and check for it themselves.
>
> If you reject nullability, the question becomes what to do in situations
> where you're implementing existing interfaces that accept nullable params.
>  The LongArrayList above implements List<Long> which requires an add(Long)
> method.  In the above implementation I chose to swap nulls with
> Long.MIN_VALUE, however I'm now thinking it best to force the user to make
> that swap and then throw IllegalArgumentException if they pass null.
>
>
> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <[EMAIL PROTECTED]
> >wrote:
>
> >
> > HmmmŠ good question.
> >
> > I think that fixed width support is important for a great many rowkey
> > constructs cases, so I'd rather see something like losing MIN_VALUE and
> > keeping fixed width.
> >
> >
> >
> >
> > On 4/1/13 2:00 PM, "Nick Dimiduk" <[EMAIL PROTECTED]> wrote:
> >
> > >Heya,
> > >
> > >Thinking about data types and serialization. I think null support is an
> > >important characteristic for the serialized representations, especially
> > >when considering the compound type. However, doing so in directly
> > >incompatible with fixed-width representations for numerics. For
> instance,
> > >if we want to have a fixed-width signed long stored on 8-bytes, where do
> > >you put null? float and double types can cheat a little by folding
> > >negative
> > >and positive NaN's into a single representation (this isn't strictly
> > >correct!), leaving a place to represent null. In the long example case,
> > >the
> > >obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. This
> > >will allocate an additional encoding which can be used for null. My
> > >experience working with scientific data, however, makes me wince at the
> > >idea.
> > >
> > >The variable-width encodings have it a little easier. There's already
> > >enough going on that it's simpler to make room.
> > >
> > >Remember, the final goal is to support order-preserving serialization.
> > >This
> > >imposes some limitations on our encoding strategies. For instance, it's
> > >not
> > >enough to simply encode null, it really needs to be encoded as 0x00 so
> as
> > >to sort lexicographically earlier than any other value.
> > >
> > >What do you think? Any ideas, experiences, etc?
> > >
> > >Thanks,
> > >Nick
> >
> >
> >
> >
>