Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Review request for HBASE-7692: Ordered byte[] serialization


Copy link to this message
-
Re: Review request for HBASE-7692: Ordered byte[] serialization
I think this belongs in core HBase, as a replacement to Bytes, which should
be deprecated eventually. We have a Bytes utility which is supposed to
convert basic java types to byte[]'s, but it does not work for signed
numbers.

We already know that all of the clients, Hive, Pig, Phoenix, have to have
at least java type -> byte[] conversion utilities, and I think it is
HBase's job to supply one so that different clients can interoperate. Since
internally we are also relying on serializing java types, we need that
library in the core.

BTW, I also think that we need to have a SQL-type to java type to byte[]
layer, but that is another discussion.

Enis
On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote:

> Nick,
>
> While I believe having an order-preserving canonical serialization is a
> good idea,  from doing a read of the mail and a skim of the jira it is not
> clear to my why this is inside hbase as part of hbase-common.
>
> Why isn't this part of a library on top of hbase (a dependency for
> Pig/Hive) instead of "inside" hbase?
> Can't this functionality be done just from the client level?
> What's the end goal hee? Is the goal here to replace the Bytes.toBytes(*)
> methods to enforced the ordering?
> If I HBase has two mutually incompatible encodings "built-in", how does a
> dev know to use one or the other later on?
> If this is essentially a mega import of a library (300k.. yikes) , why not
> make it a separate module instead of part of common?
>
> Jon.
>
> On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:
>
> > Hi everyone,
> >
> > I'm of the opinion that HBase should provide a mechanism for serializing
> > common java types such that the serialized format sorts according the
> > the natural ordering of the type. I think many application efforts end up
> > building a custom, partial implementation of this kind of functionality
> on
> > their own. I think HBase should provide a canonical implementation of
> such
> > a serialization format so that third-parties can reliably build on top of
> > HBase. Not just user applications, but other tools like Pig and Hive are
> > also enabled. Implementations for
> > HIVE-3634<https://issues.apache.org/jira/browse/HIVE-3634>,
> > HIVE-2599 <https://issues.apache.org/jira/browse/HIVE-2599>, or
> > HIVE-2903<https://issues.apache.org/jira/browse/HIVE-2903>could be
> > compatible with similar features in Pig.
> >
> > After implementing something similar on multiple occasions, stumbled
> across
> > the Orderly <https://github.com/ndimiduk/orderly> library. It's also
> > appears to have been adopted by other large projects, including
> > Lily<https://github.com/NGDATA/orderly>.
> > I've engaged the library's author for some improvements only to find out
> > he's now at Google and will no longer be maintaining it. Thus, I propose
> we
> > take it into HBase.
> >
> > HBASE-7692 <https://issues.apache.org/jira/browse/HBASE-7692> includes a
> > patch that introduces Orderly into hbase-common under the orderly
> > namespace. I have an associated branch on
> > gihub<
> https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > >wherein
> > I've broken the patch out into multiple commits to ease review.
> > Please take a few minutes to give it a look.
> >
> > Thanks,
> > Nick
> >
>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // [EMAIL PROTECTED]
>