Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - Re: HBase Types: Explicit Null Support


+
Nick Dimiduk 2013-04-02, 16:40
Copy link to this message
-
Re: HBase Types: Explicit Null Support
Dmitriy Ryaboy 2013-04-03, 18:29
Hiya Nick,
Pig converts data for HBase storage using this class:
https://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/hbase/HBaseBinaryConverter.java(which
is mostly just calling into HBase's Bytes class). As long as Bytes
handles the null stuff, we'll just inherit the behavior.
On Tue, Apr 2, 2013 at 9:40 AM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:

> I agree that a user-extensible interface is a required feature here.
> Personally, I'd love to ship a set of standard GIS tools on HBase. Let's
> keep in mind, though, that SQL and user applications are not the only
> consumers of this interface. A big motivation is allowing interop with the
> other higher MR languages. *cough* Where are my Pig and Hive peeps in this
> thread?
>
> On Mon, Apr 1, 2013 at 11:33 PM, James Taylor <[EMAIL PROTECTED]
> >wrote:
>
> > Maybe if we can keep nullability separate from the
> > serialization/deserialization, we can come up with a solution that works?
> > We're able to essentially infer that a column is null based on its value
> > being missing or empty. So if an iterator through the row key bytes could
> > detect/indicate that, then an application could "infer" the value is
> null.
> >
> > We're definitely planning on keeping byte[] accessors for use cases that
> > need it. I'm curious on the geographic data case, though, could you use a
> > fixed length long with a couple of new SQL built-ins to encode/decode the
> > latitude/longitude?
> >
> >
> > On 04/01/2013 11:29 PM, Jesse Yates wrote:
> >
> >> Actually, that isn't all that far-fetched of a format Matt - pretty
> common
> >> anytime anyone wants to do sortable lat/long (*cough* three letter
> >> agencies
> >> cough*).
> >>
> >> Wouldn't we get the same by providing a simple set of libraries (ala
> >> orderly + other HBase useful things) and then still give access to the
> >> underlying byte array? Perhaps a nullable key type in that lib makes
> sense
> >> if lots of people need it and it would be nice to have standard
> libraries
> >> so tools could interop much more easily.
> >> -------------------
> >> Jesse Yates
> >> @jesse_yates
> >> jyates.github.com
> >>
> >>
> >> On Mon, Apr 1, 2013 at 11:17 PM, Matt Corgan <[EMAIL PROTECTED]>
> wrote:
> >>
> >>  Ah, I didn't even realize sql allowed null key parts.  Maybe a goal of
> >>> the
> >>> interfaces should be to provide first-class support for custom user
> types
> >>> in addition to the standard ones included.  Part of the power of
> hbase's
> >>> plain byte[] keys is that users can concoct the perfect key for their
> >>> data
> >>> type.  For example, I have a lot of geographic data where I interleave
> >>> latitude/longitude bits into a sortable 64 bit value that would
> probably
> >>> never be included in a standard library.
> >>>
> >>>
> >>> On Mon, Apr 1, 2013 at 8:38 PM, Enis Söztutar <[EMAIL PROTECTED]>
> >>> wrote:
> >>>
> >>>  I think having Int32, and NullableInt32 would support minimum
> overhead,
> >>>>
> >>> as
> >>>
> >>>> well as allowing SQL semantics.
> >>>>
> >>>>
> >>>> On Mon, Apr 1, 2013 at 7:26 PM, Nick Dimiduk <[EMAIL PROTECTED]>
> >>>> wrote:
> >>>>
> >>>>  Furthermore, is is more important to support null values than squeeze
> >>>>>
> >>>> all
> >>>
> >>>> representations into minimum size (4-bytes for int32, &c.)?
> >>>>> On Apr 1, 2013 4:41 PM, "Nick Dimiduk" <[EMAIL PROTECTED]> wrote:
> >>>>>
> >>>>>  On Mon, Apr 1, 2013 at 4:31 PM, James Taylor <
> [EMAIL PROTECTED]
> >>>>>> wrote:
> >>>>>>
> >>>>>>   From the SQL perspective, handling null is important.
> >>>>>>>
> >>>>>>
> >>>>>>  From your perspective, it is critical to support NULLs, even at the
> >>>>>> expense of fixed-width encodings at all or supporting representation
> >>>>>>
> >>>>> of a
> >>>>
> >>>>> full range of values. That is, you'd rather be able to represent NULL
> >>>>>>
> >>>>> than
> >>>>>
> >>>>>> -2^31?
> >>>>>>
> >>>>>> On 04/01/2013 01:32 PM, Nick Dimiduk wrote:
> >>>>>>
> >>>>>>> Thanks for the thoughtful response (and code!).