Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Review request for HBASE-7692: Ordered byte[] serialization

Copy link to this message
Re: Review request for HBASE-7692: Ordered byte[] serialization
All sounds fine to me Nick.  I had not looked into the internals enough to
realize Builders were optional.

Sorry if I'm looking too far down the road, but the future implications of
including such low level building blocks could be hard to unwind.  Worth a
little discussion at least.

On Fri, Feb 22, 2013 at 5:40 PM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:

> I think we're getting ahead of ourselves a bit here. First and foremost,
> I'm looking for consensus that HBase should ship with tools for serializing
> Java primitive types such that the byte[] representations maintain sorted
> order. This is primarily to the benefit of users of HBase in that 3rd party
> tools can enjoy interoperability in so much as is provided by HBase (ie, I
> can write a Pig script that writes a long and my Hive queries can read that
> value). Furthermore, the implementations of these tools benefit from the
> order-preserving representation.
> Assuming this capacity is agreed to be desirable, I propose the adoption of
> this orphaned community library. I have no particular love for the name of
> the package, nor am I concerned terribly about which module it resides in.
> Personally, I think it should ship with (explicitly or as a dependency of)
> the hbase-client module that will exist in 0.96. This is my preference
> because I think the client API should be extended to use said serialization
> format directly -- finally, HBase could "support" types other than byte[].
> That would be a much larger change, however, and I am not interested in
> pressing it for this initial discussion.
> This introduction does not in any way affect the existing Bytes utility.
> Server components can continue to use it for marshaling their own
> primitives. This library is of interest primarily to consumers of the HBase
> client API. (I'd prefer to see Bytes deprecated from client use entirely!)
> I do not think this library or it's *optional* builder pattern should be
> used inside of the RegionServer. See also HBASE-7221 for another user who
> is asking for this kind of builder pattern. The Builder and Iterator utils
> are only a convince API, providing sugar on top of the underlying
> StructRowKey implementation. Users interested in producing or consuming
> compound objects within a tight loop need not bother with either of them.
> As for the implementation details and dependency on Hadoop Writables: it is
> my opinion that so long as its dependencies are compatible with the rest of
> HBase, it's no big deal. From that perspective, dependence on Hadoop
> Writable implementations is entirely reasonable for an initial inclusion.
> If, down the road, we wish to reduce dependencies (a practice I generally
> support) and in so doing it becomes useful to change this implementation
> detail, so be it. Say, for example, we want to release an hbase-client jar
> that has no dependency on any Hadoop types, I say go for it. The patch I
> have contribute tags all of these classes as "Evolving" interfaces, and
> nothing is set in stone until a release manager and the community bless a
> new release. I'm happy to work with whomever is interested toward
> modernizing implementation details once the initial code is in place.
> Finally, the multiple patches business is nothing more than a
> reviewer connivence. I'm generally not excited about reviewing more than
> about 20 files at a time, on Review Board or otherwise. I assume others
> share the same opinion. As I offered on the ticket itself, I'm fine with
> accepting review on Review Board on the single large patch; I assumed
> github would make it easier, not harder.
> Thanks for your attention.
> -n
> On Fri, Feb 22, 2013 at 4:48 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:
> > I agree with Jonathan that ideally this would not depend on hbase or
> > hadoop.  Could we just replace Hadoop's BytesWritable with a new class
> that
> > does the same thing?
> >
> > I also have a concern about the way it builds the multi-field byte[] by