Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Types: Explicit Null Support


Copy link to this message
-
Re: HBase Types: Explicit Null Support
On Wed, Apr 3, 2013 at 11:29 AM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Hiya Nick,
> Pig converts data for HBase storage using this class:
>
> https://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/hbase/HBaseBinaryConverter.java(which
> is mostly just calling into HBase's Bytes class). As long as Bytes
> handles the null stuff, we'll just inherit the behavior.
>

Dmitriy,

Precisely how this will be exposed via the hbase client is TBD. We won't be
deprecating the existing Bytes utility from the client view, so a new API
for supporting these types will be provided. I'll be able to provide
support and/or a patch for Pig (et al) once  the implementation is a bit
further along.

My question for you as a Pig representative is more about how Pig users
expect Pig to handle NULLs. Are NULL values within a tuple a
common occurrence in Pig? In comparison, I'm thinking about the prevalence
of NULL in SQL.

Thanks,
Nick

On Tue, Apr 2, 2013 at 9:40 AM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:
>
> > I agree that a user-extensible interface is a required feature here.
> > Personally, I'd love to ship a set of standard GIS tools on HBase. Let's
> > keep in mind, though, that SQL and user applications are not the only
> > consumers of this interface. A big motivation is allowing interop with
> the
> > other higher MR languages. *cough* Where are my Pig and Hive peeps in
> this
> > thread?
> >
> > On Mon, Apr 1, 2013 at 11:33 PM, James Taylor <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Maybe if we can keep nullability separate from the
> > > serialization/deserialization, we can come up with a solution that
> works?
> > > We're able to essentially infer that a column is null based on its
> value
> > > being missing or empty. So if an iterator through the row key bytes
> could
> > > detect/indicate that, then an application could "infer" the value is
> > null.
> > >
> > > We're definitely planning on keeping byte[] accessors for use cases
> that
> > > need it. I'm curious on the geographic data case, though, could you
> use a
> > > fixed length long with a couple of new SQL built-ins to encode/decode
> the
> > > latitude/longitude?
> > >
> > >
> > > On 04/01/2013 11:29 PM, Jesse Yates wrote:
> > >
> > >> Actually, that isn't all that far-fetched of a format Matt - pretty
> > common
> > >> anytime anyone wants to do sortable lat/long (*cough* three letter
> > >> agencies
> > >> cough*).
> > >>
> > >> Wouldn't we get the same by providing a simple set of libraries (ala
> > >> orderly + other HBase useful things) and then still give access to the
> > >> underlying byte array? Perhaps a nullable key type in that lib makes
> > sense
> > >> if lots of people need it and it would be nice to have standard
> > libraries
> > >> so tools could interop much more easily.
> > >> -------------------
> > >> Jesse Yates
> > >> @jesse_yates
> > >> jyates.github.com
> > >>
> > >>
> > >> On Mon, Apr 1, 2013 at 11:17 PM, Matt Corgan <[EMAIL PROTECTED]>
> > wrote:
> > >>
> > >>  Ah, I didn't even realize sql allowed null key parts.  Maybe a goal
> of
> > >>> the
> > >>> interfaces should be to provide first-class support for custom user
> > types
> > >>> in addition to the standard ones included.  Part of the power of
> > hbase's
> > >>> plain byte[] keys is that users can concoct the perfect key for their
> > >>> data
> > >>> type.  For example, I have a lot of geographic data where I
> interleave
> > >>> latitude/longitude bits into a sortable 64 bit value that would
> > probably
> > >>> never be included in a standard library.
> > >>>
> > >>>
> > >>> On Mon, Apr 1, 2013 at 8:38 PM, Enis Söztutar <[EMAIL PROTECTED]>
> > >>> wrote:
> > >>>
> > >>>  I think having Int32, and NullableInt32 would support minimum
> > overhead,
> > >>>>
> > >>> as
> > >>>
> > >>>> well as allowing SQL semantics.
> > >>>>
> > >>>>
> > >>>> On Mon, Apr 1, 2013 at 7:26 PM, Nick Dimiduk <[EMAIL PROTECTED]>
> > >>>> wrote:
> > >>>>
> > >>>>  Furthermore, is is more important to support null values than
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB