Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Re: HBase Types: Explicit Null Support


Copy link to this message
-
Re: HBase Types: Explicit Null Support
Hiya Nick,
Pig converts data for HBase storage using this class:
https://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/hbase/HBaseBinaryConverter.java(which
is mostly just calling into HBase's Bytes class). As long as Bytes
handles the null stuff, we'll just inherit the behavior.
On Tue, Apr 2, 2013 at 9:40 AM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:

> I agree that a user-extensible interface is a required feature here.
> Personally, I'd love to ship a set of standard GIS tools on HBase. Let's
> keep in mind, though, that SQL and user applications are not the only
> consumers of this interface. A big motivation is allowing interop with the
> other higher MR languages. *cough* Where are my Pig and Hive peeps in this
> thread?
>
> On Mon, Apr 1, 2013 at 11:33 PM, James Taylor <[EMAIL PROTECTED]
> >wrote:
>
> > Maybe if we can keep nullability separate from the
> > serialization/deserialization, we can come up with a solution that works?
> > We're able to essentially infer that a column is null based on its value
> > being missing or empty. So if an iterator through the row key bytes could
> > detect/indicate that, then an application could "infer" the value is
> null.
> >
> > We're definitely planning on keeping byte[] accessors for use cases that
> > need it. I'm curious on the geographic data case, though, could you use a
> > fixed length long with a couple of new SQL built-ins to encode/decode the
> > latitude/longitude?
> >
> >
> > On 04/01/2013 11:29 PM, Jesse Yates wrote:
> >
> >> Actually, that isn't all that far-fetched of a format Matt - pretty
> common
> >> anytime anyone wants to do sortable lat/long (*cough* three letter
> >> agencies
> >> cough*).
> >>
> >> Wouldn't we get the same by providing a simple set of libraries (ala
> >> orderly + other HBase useful things) and then still give access to the
> >> underlying byte array? Perhaps a nullable key type in that lib makes
> sense
> >> if lots of people need it and it would be nice to have standard
> libraries
> >> so tools could interop much more easily.
> >> -------------------
> >> Jesse Yates
> >> @jesse_yates
> >> jyates.github.com
> >>
> >>
> >> On Mon, Apr 1, 2013 at 11:17 PM, Matt Corgan <[EMAIL PROTECTED]>
> wrote:
> >>
> >>  Ah, I didn't even realize sql allowed null key parts.  Maybe a goal of
> >>> the
> >>> interfaces should be to provide first-class support for custom user
> types
> >>> in addition to the standard ones included.  Part of the power of
> hbase's
> >>> plain byte[] keys is that users can concoct the perfect key for their
> >>> data
> >>> type.  For example, I have a lot of geographic data where I interleave
> >>> latitude/longitude bits into a sortable 64 bit value that would
> probably
> >>> never be included in a standard library.
> >>>
> >>>
> >>> On Mon, Apr 1, 2013 at 8:38 PM, Enis Söztutar <[EMAIL PROTECTED]>
> >>> wrote:
> >>>
> >>>  I think having Int32, and NullableInt32 would support minimum
> overhead,
> >>>>
> >>> as
> >>>
> >>>> well as allowing SQL semantics.
> >>>>
> >>>>
> >>>> On Mon, Apr 1, 2013 at 7:26 PM, Nick Dimiduk <[EMAIL PROTECTED]>
> >>>> wrote:
> >>>>
> >>>>  Furthermore, is is more important to support null values than squeeze
> >>>>>
> >>>> all
> >>>
> >>>> representations into minimum size (4-bytes for int32, &c.)?
> >>>>> On Apr 1, 2013 4:41 PM, "Nick Dimiduk" <[EMAIL PROTECTED]> wrote:
> >>>>>
> >>>>>  On Mon, Apr 1, 2013 at 4:31 PM, James Taylor <
> [EMAIL PROTECTED]
> >>>>>> wrote:
> >>>>>>
> >>>>>>   From the SQL perspective, handling null is important.
> >>>>>>>
> >>>>>>
> >>>>>>  From your perspective, it is critical to support NULLs, even at the
> >>>>>> expense of fixed-width encodings at all or supporting representation
> >>>>>>
> >>>>> of a
> >>>>
> >>>>> full range of values. That is, you'd rather be able to represent NULL
> >>>>>>
> >>>>> than
> >>>>>
> >>>>>> -2^31?
> >>>>>>
> >>>>>> On 04/01/2013 01:32 PM, Nick Dimiduk wrote:
> >>>>>>
> >>>>>>> Thanks for the thoughtful response (and code!).
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB