Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Types: Explicit Null Support


Copy link to this message
-
Re: HBase Types: Explicit Null Support
On Wed, Apr 3, 2013 at 11:29 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Hiya Nick,
> Pig converts data for HBase storage using this class:
>
> https://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/hbase/HBaseBinaryConverter.java(which
> is mostly just calling into HBase's Bytes class). As long as Bytes
> handles the null stuff, we'll just inherit the behavior.
>

Dmitriy,

Precisely how this will be exposed via the hbase client is TBD. We won't be
deprecating the existing Bytes utility from the client view, so a new API
for supporting these types will be provided. I'll be able to provide
support and/or a patch for Pig (et al) once  the implementation is a bit
further along.

My question for you as a Pig representative is more about how Pig users
expect Pig to handle NULLs. Are NULL values within a tuple a
common occurrence in Pig? In comparison, I'm thinking about the prevalence
of NULL in SQL.

Thanks,
Nick

On Tue, Apr 2, 2013 at 9:40 AM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:
>
> > I agree that a user-extensible interface is a required feature here.
> > Personally, I'd love to ship a set of standard GIS tools on HBase. Let's
> > keep in mind, though, that SQL and user applications are not the only
> > consumers of this interface. A big motivation is allowing interop with
> the
> > other higher MR languages. *cough* Where are my Pig and Hive peeps in
> this
> > thread?
> >
> > On Mon, Apr 1, 2013 at 11:33 PM, James Taylor <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Maybe if we can keep nullability separate from the
> > > serialization/deserialization, we can come up with a solution that
> works?
> > > We're able to essentially infer that a column is null based on its
> value
> > > being missing or empty. So if an iterator through the row key bytes
> could
> > > detect/indicate that, then an application could "infer" the value is
> > null.
> > >
> > > We're definitely planning on keeping byte[] accessors for use cases
> that
> > > need it. I'm curious on the geographic data case, though, could you
> use a
> > > fixed length long with a couple of new SQL built-ins to encode/decode
> the
> > > latitude/longitude?
> > >
> > >
> > > On 04/01/2013 11:29 PM, Jesse Yates wrote:
> > >
> > >> Actually, that isn't all that far-fetched of a format Matt - pretty
> > common
> > >> anytime anyone wants to do sortable lat/long (*cough* three letter
> > >> agencies
> > >> cough*).
> > >>
> > >> Wouldn't we get the same by providing a simple set of libraries (ala
> > >> orderly + other HBase useful things) and then still give access to the
> > >> underlying byte array? Perhaps a nullable key type in that lib makes
> > sense
> > >> if lots of people need it and it would be nice to have standard
> > libraries
> > >> so tools could interop much more easily.
> > >> -------------------
> > >> Jesse Yates
> > >> @jesse_yates
> > >> jyates.github.com
> > >>
> > >>
> > >> On Mon, Apr 1, 2013 at 11:17 PM, Matt Corgan <[EMAIL PROTECTED]>
> > wrote:
> > >>
> > >>  Ah, I didn't even realize sql allowed null key parts.  Maybe a goal
> of
> > >>> the
> > >>> interfaces should be to provide first-class support for custom user
> > types
> > >>> in addition to the standard ones included.  Part of the power of
> > hbase's
> > >>> plain byte[] keys is that users can concoct the perfect key for their
> > >>> data
> > >>> type.  For example, I have a lot of geographic data where I
> interleave
> > >>> latitude/longitude bits into a sortable 64 bit value that would
> > probably
> > >>> never be included in a standard library.
> > >>>
> > >>>
> > >>> On Mon, Apr 1, 2013 at 8:38 PM, Enis Söztutar <[EMAIL PROTECTED]>
> > >>> wrote:
> > >>>
> > >>>  I think having Int32, and NullableInt32 would support minimum
> > overhead,
> > >>>>
> > >>> as
> > >>>
> > >>>> well as allowing SQL semantics.
> > >>>>
> > >>>>
> > >>>> On Mon, Apr 1, 2013 at 7:26 PM, Nick Dimiduk <[EMAIL PROTECTED]>
> > >>>> wrote:
> > >>>>
> > >>>>  Furthermore, is is more important to support null values than