Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Re: HBase Types: Explicit Null Support


Copy link to this message
-
Re: HBase Types: Explicit Null Support
bq. with a base implementation that does not support nulls

+1
On Mon, Apr 1, 2013 at 1:32 PM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:

> Thanks for the thoughtful response (and code!).
>
> I'm thinking I will press forward with a base implementation that does not
> support nulls. The idea is to provide an extensible set of interfaces, so I
> think this will not box us into a corner later. That is, a mirroring
> package could be implemented that supports null values and accepts
> the relevant trade-offs.
>
> Thanks,
> Nick
>
> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:
>
> > I spent some time this weekend extracting bits of our serialization code
> to
> > a public github repo at http://github.com/hotpads/data-tools.
> >  Contributions are welcome - i'm sure we all have this stuff laying
> around.
> >
> > You can see I've bumped into the NULL problem in a few places:
> > *
> >
> >
> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java
> > *
> >
> >
> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java
> >
> > Looking back, I think my latest opinion on the topic is to reject
> > nullability as the rule since it can cause unexpected behavior and
> > confusion.  It's cleaner to provide a wrapper class (so both
> LongArrayList
> > plus NullableLongArrayList) that explicitly defines the behavior, and
> costs
> > a little more in performance.  If the user can't find a pre-made wrapper
> > class, it's not very difficult for each user to provide their own
> > interpretation of null and check for it themselves.
> >
> > If you reject nullability, the question becomes what to do in situations
> > where you're implementing existing interfaces that accept nullable
> params.
> >  The LongArrayList above implements List<Long> which requires an
> add(Long)
> > method.  In the above implementation I chose to swap nulls with
> > Long.MIN_VALUE, however I'm now thinking it best to force the user to
> make
> > that swap and then throw IllegalArgumentException if they pass null.
> >
> >
> > On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > >
> > > HmmmŠ good question.
> > >
> > > I think that fixed width support is important for a great many rowkey
> > > constructs cases, so I'd rather see something like losing MIN_VALUE and
> > > keeping fixed width.
> > >
> > >
> > >
> > >
> > > On 4/1/13 2:00 PM, "Nick Dimiduk" <[EMAIL PROTECTED]> wrote:
> > >
> > > >Heya,
> > > >
> > > >Thinking about data types and serialization. I think null support is
> an
> > > >important characteristic for the serialized representations,
> especially
> > > >when considering the compound type. However, doing so in directly
> > > >incompatible with fixed-width representations for numerics. For
> > instance,
> > > >if we want to have a fixed-width signed long stored on 8-bytes, where
> do
> > > >you put null? float and double types can cheat a little by folding
> > > >negative
> > > >and positive NaN's into a single representation (this isn't strictly
> > > >correct!), leaving a place to represent null. In the long example
> case,
> > > >the
> > > >obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one.
> This
> > > >will allocate an additional encoding which can be used for null. My
> > > >experience working with scientific data, however, makes me wince at
> the
> > > >idea.
> > > >
> > > >The variable-width encodings have it a little easier. There's already
> > > >enough going on that it's simpler to make room.
> > > >
> > > >Remember, the final goal is to support order-preserving serialization.
> > > >This
> > > >imposes some limitations on our encoding strategies. For instance,
> it's
> > > >not
> > > >enough to simply encode null, it really needs to be encoded as 0x00 so
> > as
> > > >to sort lexicographically earlier than any other value.
> > > >
> > > >What do you think? Any ideas, experiences, etc?