Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase Types: Explicit Null Support


+
Nick Dimiduk 2013-04-01, 18:00
+
Doug Meil 2013-04-01, 18:41
+
Matt Corgan 2013-04-01, 19:26
Copy link to this message
-
Re: HBase Types: Explicit Null Support
Thanks for the thoughtful response (and code!).

I'm thinking I will press forward with a base implementation that does not
support nulls. The idea is to provide an extensible set of interfaces, so I
think this will not box us into a corner later. That is, a mirroring
package could be implemented that supports null values and accepts
the relevant trade-offs.

Thanks,
Nick

On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:

> I spent some time this weekend extracting bits of our serialization code to
> a public github repo at http://github.com/hotpads/data-tools.
>  Contributions are welcome - i'm sure we all have this stuff laying around.
>
> You can see I've bumped into the NULL problem in a few places:
> *
>
> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java
> *
>
> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java
>
> Looking back, I think my latest opinion on the topic is to reject
> nullability as the rule since it can cause unexpected behavior and
> confusion.  It's cleaner to provide a wrapper class (so both LongArrayList
> plus NullableLongArrayList) that explicitly defines the behavior, and costs
> a little more in performance.  If the user can't find a pre-made wrapper
> class, it's not very difficult for each user to provide their own
> interpretation of null and check for it themselves.
>
> If you reject nullability, the question becomes what to do in situations
> where you're implementing existing interfaces that accept nullable params.
>  The LongArrayList above implements List<Long> which requires an add(Long)
> method.  In the above implementation I chose to swap nulls with
> Long.MIN_VALUE, however I'm now thinking it best to force the user to make
> that swap and then throw IllegalArgumentException if they pass null.
>
>
> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <[EMAIL PROTECTED]
> >wrote:
>
> >
> > HmmmŠ good question.
> >
> > I think that fixed width support is important for a great many rowkey
> > constructs cases, so I'd rather see something like losing MIN_VALUE and
> > keeping fixed width.
> >
> >
> >
> >
> > On 4/1/13 2:00 PM, "Nick Dimiduk" <[EMAIL PROTECTED]> wrote:
> >
> > >Heya,
> > >
> > >Thinking about data types and serialization. I think null support is an
> > >important characteristic for the serialized representations, especially
> > >when considering the compound type. However, doing so in directly
> > >incompatible with fixed-width representations for numerics. For
> instance,
> > >if we want to have a fixed-width signed long stored on 8-bytes, where do
> > >you put null? float and double types can cheat a little by folding
> > >negative
> > >and positive NaN's into a single representation (this isn't strictly
> > >correct!), leaving a place to represent null. In the long example case,
> > >the
> > >obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. This
> > >will allocate an additional encoding which can be used for null. My
> > >experience working with scientific data, however, makes me wince at the
> > >idea.
> > >
> > >The variable-width encodings have it a little easier. There's already
> > >enough going on that it's simpler to make room.
> > >
> > >Remember, the final goal is to support order-preserving serialization.
> > >This
> > >imposes some limitations on our encoding strategies. For instance, it's
> > >not
> > >enough to simply encode null, it really needs to be encoded as 0x00 so
> as
> > >to sort lexicographically earlier than any other value.
> > >
> > >What do you think? Any ideas, experiences, etc?
> > >
> > >Thanks,
> > >Nick
> >
> >
> >
> >
>
+
James Taylor 2013-04-01, 23:31
+
Nick Dimiduk 2013-04-01, 23:41
+
Nick Dimiduk 2013-04-02, 02:26
+
Enis Söztutar 2013-04-02, 03:38
+
Matt Corgan 2013-04-02, 06:17
+
Michel Segel 2013-04-02, 02:40
+
James Taylor 2013-04-01, 23:49
+
Matt Corgan 2013-04-02, 00:07
+
Nick Dimiduk 2013-04-05, 00:34
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB