Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - prefix compression implementation


Copy link to this message
-
Re: prefix compression implementation
Matt Corgan 2011-09-20, 17:59
bringing all questions into a single email:

stack >> I'd say call it Cell rather than HCell.

i did think the H was a very simple way to add uniqueness, like isn't
"HFile" a big win over "File"?  there are already two other classes called
"Cell" in hbase (guava and REST gateway).  another option could be KV,
though i don't like making exceptions to java's no-abbreviations guidelines.

stack >> You have getRowArray rather than getRow which we currently have but
I
suppose it makes sense since you can then group by suffix.

i guess the point is to emphasize that those are low performance methods
that shouldn't normally be called

stack >> There is a patch lying around that adds a version to KV by using
top
two bytes of the type byte.  If you need me to dig it up, just say
(then you might not have to have v1 stuff in your Interface).

not sure what you mean here.  top two bits?  you mean encoding the timestamp
inside the type byte?

stack >> You might need to add some equals for stuff like same row, cf, and
qualifier... but they can come later.

i've got some equals methods at the bottom.  maybe you skimmed over those,
or do you mean something different than those?

stack >> The comparator stuff is currently horrid because it depends on
context; i.e. whether the KVs are from -ROOT- or .META. or from a
userspace table.  There are some ideas for having it so only one
comparator for all types but thats another issue.

interesting.  i wasn't aware of any of that.  guess that's why i'm throwing
all these ideas out there before going any further

ryan >> I was just pushing back at the idea of 'turn everything into
interfaces! problem solved!', and thinking about what was really
necessary to get to where you want to go...

gotcha.  i don't think it's a good idea to roll out the interface over the
entire code base any time soon.  i just think it's inevitable that we make
an interface at some point, and that the prefix trie would be so much easier
if programming to a clean interface.

stack >> One other thought is that exposing ByteRange, ByteBuffer, and v1
array
stuff in Interface seems like you are exposing 'implementation'
details that perhaps shouldn't show through.  I'm guessing its
unavoidable though if the Interface is to be used in a few different
contexts: i.e. "v1" has to work if we are to get this new stuff in,
some srcs will be DBBs, etc.

true, it's implementation details, but important for performance.  the
interface in this case is a balance between clean code and performance.
 cleanest code would leave the interface with only those top getXArray()
methods, but performance requires all the other methods.  i'm really just
throwing it out there for brainstorming purposes.  for example, i haven't
really though through whether that ByteRange thing is a good idea.  maybe we
should just be using ByteBuffer.wrap(byte[]).  let's discuss in another
email chain if anyone has comments on ByteRange

ryan >> So if the HCell or whatever ends up returning ByteBuffers, then that
plays straight in to scatter/gather NIO calls, and if some of them are
DBB, then so much the merrier.  For example, the thrift stuff takes
ByteBuffers when its calling for a
byte sequence.

i'm going to start a new thread for this question too.  i have some
questions about ByteBuffer usage outside the off-heap cache
On Mon, Sep 19, 2011 at 10:41 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote:

> So if the HCell or whatever ends up returning ByteBuffers, then that
> plays straight in to scatter/gather NIO calls, and if some of them are
> DBB, then so much the merrier.
>
> For example, the thrift stuff takes ByteBuffers when its calling for a
> byte sequence.
>
> -ryan
>
> On Mon, Sep 19, 2011 at 10:39 PM, Stack <[EMAIL PROTECTED]> wrote:
> > One other thought is that exposing ByteRange, ByteBuffer, and v1 array
> > stuff in Interface seems like you are exposing 'implementation'
> > details that perhaps shouldn't show through.  I'm guessing its
> > unavoidable though if the Interface is to be used in a few different