Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Re: HBase Types: Explicit Null Support


Copy link to this message
-
Re: HBase Types: Explicit Null Support
Actually, that isn't all that far-fetched of a format Matt - pretty common
anytime anyone wants to do sortable lat/long (*cough* three letter agencies
cough*).

Wouldn't we get the same by providing a simple set of libraries (ala
orderly + other HBase useful things) and then still give access to the
underlying byte array? Perhaps a nullable key type in that lib makes sense
if lots of people need it and it would be nice to have standard libraries
so tools could interop much more easily.
-------------------
Jesse Yates
@jesse_yates
jyates.github.com
On Mon, Apr 1, 2013 at 11:17 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:

> Ah, I didn't even realize sql allowed null key parts.  Maybe a goal of the
> interfaces should be to provide first-class support for custom user types
> in addition to the standard ones included.  Part of the power of hbase's
> plain byte[] keys is that users can concoct the perfect key for their data
> type.  For example, I have a lot of geographic data where I interleave
> latitude/longitude bits into a sortable 64 bit value that would probably
> never be included in a standard library.
>
>
> On Mon, Apr 1, 2013 at 8:38 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote:
>
> > I think having Int32, and NullableInt32 would support minimum overhead,
> as
> > well as allowing SQL semantics.
> >
> >
> > On Mon, Apr 1, 2013 at 7:26 PM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:
> >
> > > Furthermore, is is more important to support null values than squeeze
> all
> > > representations into minimum size (4-bytes for int32, &c.)?
> > > On Apr 1, 2013 4:41 PM, "Nick Dimiduk" <[EMAIL PROTECTED]> wrote:
> > >
> > > > On Mon, Apr 1, 2013 at 4:31 PM, James Taylor <[EMAIL PROTECTED]
> > > >wrote:
> > > >
> > > >> From the SQL perspective, handling null is important.
> > > >
> > > >
> > > > From your perspective, it is critical to support NULLs, even at the
> > > > expense of fixed-width encodings at all or supporting representation
> > of a
> > > > full range of values. That is, you'd rather be able to represent NULL
> > > than
> > > > -2^31?
> > > >
> > > > On 04/01/2013 01:32 PM, Nick Dimiduk wrote:
> > > >>
> > > >>> Thanks for the thoughtful response (and code!).
> > > >>>
> > > >>> I'm thinking I will press forward with a base implementation that
> > does
> > > >>> not
> > > >>> support nulls. The idea is to provide an extensible set of
> > interfaces,
> > > >>> so I
> > > >>> think this will not box us into a corner later. That is, a
> mirroring
> > > >>> package could be implemented that supports null values and accepts
> > > >>> the relevant trade-offs.
> > > >>>
> > > >>> Thanks,
> > > >>> Nick
> > > >>>
> > > >>> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <[EMAIL PROTECTED]>
> > > >>> wrote:
> > > >>>
> > > >>>  I spent some time this weekend extracting bits of our
> serialization
> > > >>>> code to
> > > >>>> a public github repo at http://github.com/hotpads/**data-tools<
> > > http://github.com/hotpads/data-tools>
> > > >>>> .
> > > >>>>   Contributions are welcome - i'm sure we all have this stuff
> laying
> > > >>>> around.
> > > >>>>
> > > >>>> You can see I've bumped into the NULL problem in a few places:
> > > >>>> *
> > > >>>>
> > > >>>> https://github.com/hotpads/**data-tools/blob/master/src/**
> > > >>>> main/java/com/hotpads/data/**primitive/lists/LongArrayList.**java<
> > >
> >
> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java
> > > >
> > > >>>> *
> > > >>>>
> > > >>>> https://github.com/hotpads/**data-tools/blob/master/src/**
> > > >>>> main/java/com/hotpads/data/**types/floats/DoubleByteTool.**java<
> > >
> >
> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java
> > > >
> > > >>>>
> > > >>>> Looking back, I think my latest opinion on the topic is to reject
> > > >>>> nullability as the rule since it can cause unexpected behavior and
> > > >>>> confusion.  It's cleaner to provide a wrapper class (so both