Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # dev - Re: HBase type support


+
Matteo Bertozzi 2013-03-14, 21:47
Copy link to this message
-
Re: HBase type support
Nick Dimiduk 2013-03-15, 17:01
On Thu, Mar 14, 2013 at 2:47 PM, Matteo Bertozzi <[EMAIL PROTECTED]>wrote:

> could you point me to the big picture of this jira?
>

The big picture is documented in the attachment on the ticket. If it's
lacking, let me clarify and improve the document.

from what I've understood this is something like
> extending the Bytes.toInt() to all the types
> to allow the user to have something more like
>       table.putInt(myKey, 100);
>       int v = table.getInt(myKey)
>

That's only on the surface, but yes, as a final step, we could integrate
these types into the client API. I started some brain-storming about that
on HBASE-7941. As a naive blanket statement, anything supported by Bytes
should be supported natively by the client.

or is there something more?
>

Improvements to the client API are only a useful side-effect. The real
point here is for HBase to ship with "support" for data types besides
byte[]. Those data types would be defined according to the "HBase
Management System", just like an RDBMS defines types that it supports.
These types are intentionally defined independent of Java; HBase needs
better support in more languages in the future, so being tied further to
Java doesn't help with that. "Support" means provide conversion between
these types and the byte[] HBase uses under the hood. Because of HBase's
semantics, it is critical that this conversion maintain the natural
ordering of the originating type. This was my original intention in
introducing HBASE-7692. I outlined the motivation for this in the attached
document.

like table schemas, entities & co similar to the kiji project?
>

Table schemas are off the table for this. See the "Out of scope" section of
the document. Just as HBase now is BYO-types, HBase after this improvement
will be BYO-schema. Today, the application must manage a schema defining a
map from application entities to HBase Cells. The application also manages
its own serialization details for turning language types into byte[]. This
ticket seeks to alleviate the latter. Entities are a little conflated;
we've discussed providing a "compound type", which you could
consider equivalent to an entity. Without table schema, there's no where to
store that entity's definition.

are you also thinking at some sort of data-awareness server side
> for example encoding/compression based on the data type
> or compaction policies if your key is a date or similar?
>

Not so far. There's an understandable high amount of resistance to baking
this into the server-side. There are indeed a number of things that could
be done with more thorough type awareness, but they are not addressed here.
This is intended only as a client-side improvement for the sake of users of
HBase.

How can the motivation document be improved to make these intentions more
clear?

Thanks,
Nick

On Wed, Mar 13, 2013 at 4:42 PM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:
>
> > Hi all,
> >
> > I'd like to draw your attention to HBASE-8089. The desire is to add type
> > support to HBase. There are two primary objectives: make the lives of
> > developers building on HBase easier, and facilitate better tools on top
> of
> > HBase. Please chime in with any feature suggestions you think we've
> missed
> > in initial conversations.
> >
> > Thanks,
> > -n
> >
> > [0]: https://issues.apache.org/jira/browse/HBASE-8089
> >
>