HBase, mail # dev - Review request for HBASE-7692: Ordered byte[] serialization

Nick Dimiduk 2013-02-21, 18:35
Jonathan Hsieh 2013-02-21, 23:04
lars hofhansl 2013-02-22, 04:24
Enis Söztutar 2013-02-22, 03:23
Jonathan Hsieh 2013-02-22, 07:56
Nick Dimiduk 2013-02-22, 14:13
Jonathan Hsieh 2013-02-22, 14:31
Elliott Clark 2013-02-22, 17:32
Matt Corgan 2013-02-22, 18:00
Nick Dimiduk 2013-02-22, 18:04
Matt Corgan 2013-02-22, 18:14
Nick Dimiduk 2013-02-22, 18:48
Nick Dimiduk 2013-02-22, 19:37
Ted Yu 2013-02-22, 21:14
Stack 2013-02-26, 23:13
Jesse Yates 2013-02-22, 18:01
Nick Dimiduk 2013-02-22, 23:13
Jonathan Hsieh 2013-02-23, 00:33
Matt Corgan 2013-02-23, 00:48
Re: Review request for HBASE-7692: Ordered byte[] serialization
Nick Dimiduk 2013-02-23, 01:40
I think we're getting ahead of ourselves a bit here. First and foremost,
I'm looking for consensus that HBase should ship with tools for serializing
Java primitive types such that the byte[] representations maintain sorted
order. This is primarily to the benefit of users of HBase in that 3rd party
tools can enjoy interoperability in so much as is provided by HBase (ie, I
can write a Pig script that writes a long and my Hive queries can read that
value). Furthermore, the implementations of these tools benefit from the
order-preserving representation.

Assuming this capacity is agreed to be desirable, I propose the adoption of
this orphaned community library. I have no particular love for the name of
the package, nor am I concerned terribly about which module it resides in.
Personally, I think it should ship with (explicitly or as a dependency of)
the hbase-client module that will exist in 0.96. This is my preference
because I think the client API should be extended to use said serialization
format directly -- finally, HBase could "support" types other than byte[].
That would be a much larger change, however, and I am not interested in
pressing it for this initial discussion.

This introduction does not in any way affect the existing Bytes utility.
Server components can continue to use it for marshaling their own
primitives. This library is of interest primarily to consumers of the HBase
client API. (I'd prefer to see Bytes deprecated from client use entirely!)
I do not think this library or it's *optional* builder pattern should be
used inside of the RegionServer. See also HBASE-7221 for another user who
is asking for this kind of builder pattern. The Builder and Iterator utils
are only a convince API, providing sugar on top of the underlying
StructRowKey implementation. Users interested in producing or consuming
compound objects within a tight loop need not bother with either of them.

As for the implementation details and dependency on Hadoop Writables: it is
my opinion that so long as its dependencies are compatible with the rest of
HBase, it's no big deal. From that perspective, dependence on Hadoop
Writable implementations is entirely reasonable for an initial inclusion.
If, down the road, we wish to reduce dependencies (a practice I generally
support) and in so doing it becomes useful to change this implementation
detail, so be it. Say, for example, we want to release an hbase-client jar
that has no dependency on any Hadoop types, I say go for it. The patch I
have contribute tags all of these classes as "Evolving" interfaces, and
nothing is set in stone until a release manager and the community bless a
new release. I'm happy to work with whomever is interested toward
modernizing implementation details once the initial code is in place.

Finally, the multiple patches business is nothing more than a
reviewer connivence. I'm generally not excited about reviewing more than
about 20 files at a time, on Review Board or otherwise. I assume others
share the same opinion. As I offered on the ticket itself, I'm fine with
accepting review on Review Board on the single large patch; I assumed
github would make it easier, not harder.

Thanks for your attention.

On Fri, Feb 22, 2013 at 4:48 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:

> I agree with Jonathan that ideally this would not depend on hbase or
> hadoop.  Could we just replace Hadoop's BytesWritable with a new class that
> does the same thing?
> I also have a concern about the way it builds the multi-field byte[] by
> allocating somewhat expensive Builder objects, etc.  It's suitable for
> application level code, but most of the innards of hbase regionserver
> should be using tighter code for best performance and less garbage.
>  Perhaps in a future issue we can separate the builder wrappers from their
> internal byte converters so that hbase-server can use the lower-level byte
> converters without the builder overhead.
> On Fri, Feb 22, 2013 at 4:33 PM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote:
Matt Corgan 2013-02-23, 02:04
Stack 2013-02-26, 23:20
Stack 2013-02-26, 23:17
Ted 2013-02-22, 14:21
Stack 2013-02-26, 23:08