Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # dev - Review request for HBASE-7692: Ordered byte[] serialization


+
Nick Dimiduk 2013-02-21, 18:35
+
Jonathan Hsieh 2013-02-21, 23:04
+
lars hofhansl 2013-02-22, 04:24
+
Enis Söztutar 2013-02-22, 03:23
Copy link to this message
-
Re: Review request for HBASE-7692: Ordered byte[] serialization
Jonathan Hsieh 2013-02-22, 07:56
So I buy the argument about this being included in hbase, but several of
the questions still stand --

Why is this part of hbase-common?  shouldn't this be just a dependency of
hbase-client module?  Does the hbase-server side need to depend on this?

Since this is a large import of a currently isolated library, why not make
it a separate module instead of part of hbase-common?  This would enforce a
boundary that will prevent pollution from circular dependencies.

Jon.

On Thu, Feb 21, 2013 at 7:23 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote:

> I think this belongs in core HBase, as a replacement to Bytes, which should
> be deprecated eventually. We have a Bytes utility which is supposed to
> convert basic java types to byte[]'s, but it does not work for signed
> numbers.
>
> We already know that all of the clients, Hive, Pig, Phoenix, have to have
> at least java type -> byte[] conversion utilities, and I think it is
> HBase's job to supply one so that different clients can interoperate. Since
> internally we are also relying on serializing java types, we need that
> library in the core.
>
> BTW, I also think that we need to have a SQL-type to java type to byte[]
> layer, but that is another discussion.
>
> Enis
>
>
> On Thu, Feb 21, 2013 at 3:04 PM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote:
>
> > Nick,
> >
> > While I believe having an order-preserving canonical serialization is a
> > good idea,  from doing a read of the mail and a skim of the jira it is
> not
> > clear to my why this is inside hbase as part of hbase-common.
> >
> > Why isn't this part of a library on top of hbase (a dependency for
> > Pig/Hive) instead of "inside" hbase?
> > Can't this functionality be done just from the client level?
> > What's the end goal hee? Is the goal here to replace the Bytes.toBytes(*)
> > methods to enforced the ordering?
> > If I HBase has two mutually incompatible encodings "built-in", how does a
> > dev know to use one or the other later on?
> > If this is essentially a mega import of a library (300k.. yikes) , why
> not
> > make it a separate module instead of part of common?
> >
> > Jon.
> >
> > On Thu, Feb 21, 2013 at 10:35 AM, Nick Dimiduk <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hi everyone,
> > >
> > > I'm of the opinion that HBase should provide a mechanism for
> serializing
> > > common java types such that the serialized format sorts according the
> > > the natural ordering of the type. I think many application efforts end
> up
> > > building a custom, partial implementation of this kind of functionality
> > on
> > > their own. I think HBase should provide a canonical implementation of
> > such
> > > a serialization format so that third-parties can reliably build on top
> of
> > > HBase. Not just user applications, but other tools like Pig and Hive
> are
> > > also enabled. Implementations for
> > > HIVE-3634<https://issues.apache.org/jira/browse/HIVE-3634>,
> > > HIVE-2599 <https://issues.apache.org/jira/browse/HIVE-2599>, or
> > > HIVE-2903<https://issues.apache.org/jira/browse/HIVE-2903>could be
> > > compatible with similar features in Pig.
> > >
> > > After implementing something similar on multiple occasions, stumbled
> > across
> > > the Orderly <https://github.com/ndimiduk/orderly> library. It's also
> > > appears to have been adopted by other large projects, including
> > > Lily<https://github.com/NGDATA/orderly>.
> > > I've engaged the library's author for some improvements only to find
> out
> > > he's now at Google and will no longer be maintaining it. Thus, I
> propose
> > we
> > > take it into HBase.
> > >
> > > HBASE-7692 <https://issues.apache.org/jira/browse/HBASE-7692>
> includes a
> > > patch that introduces Orderly into hbase-common under the orderly
> > > namespace. I have an associated branch on
> > > gihub<
> > https://github.com/ndimiduk/hbase/commits/7692-ordered-serialization
> > > >wherein
> > > I've broken the patch out into multiple commits to ease review.
> > > Please take a few minutes to give it a look.

// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// [EMAIL PROTECTED]
+
Nick Dimiduk 2013-02-22, 14:13
+
Jonathan Hsieh 2013-02-22, 14:31
+
Elliott Clark 2013-02-22, 17:32
+
Matt Corgan 2013-02-22, 18:00
+
Nick Dimiduk 2013-02-22, 18:04
+
Matt Corgan 2013-02-22, 18:14
+
Nick Dimiduk 2013-02-22, 18:48
+
Nick Dimiduk 2013-02-22, 19:37
+
Ted Yu 2013-02-22, 21:14
+
Stack 2013-02-26, 23:13
+
Jesse Yates 2013-02-22, 18:01
+
Nick Dimiduk 2013-02-22, 23:13
+
Jonathan Hsieh 2013-02-23, 00:33
+
Matt Corgan 2013-02-23, 00:48
+
Nick Dimiduk 2013-02-23, 01:40
+
Matt Corgan 2013-02-23, 02:04
+
Stack 2013-02-26, 23:20
+
Stack 2013-02-26, 23:17
+
Ted 2013-02-22, 14:21
+
Stack 2013-02-26, 23:08