Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> prefix compression implementation


Copy link to this message
-
Re: prefix compression implementation
I'm a little confused over the direction of the DBBs in general, hence the
lack of clarity in my code.

I see value in doing fine-grained parsing of the DBB if you're going to have
a large block of data and only want to retrieve a small KV from the middle
of it.  With this trie design, you can navigate your way through the DBB
without copying hardly anything to the heap.  It would be a shame blow away
your entire L1 cache by loading a whole 256KB block onto heap if you only
want to read 200 bytes out of the middle... it can be done
ultra-efficiently.

The problem is if you're going to iterate through an entire block made of
5000 small KV's doing thousands of DBB.get(index) calls.  Those are like 10x
slower than byte[index] calls.  In that case, if it's a DBB, you want to
copy the full block on-heap and access it through the byte[] interface.  If
it's a HeapBB, then you already have access to the underlying byte[].

So there's possibly value in implementing both methods.  The main problem i
see is a lack of interfaces in the current code base.  I'll throw one
suggestion out there as food for thought.  Create a new interface:

interface HCell{
  byte[] getRow();
  byte[] getFamily();
  byte[] getQualifier();
  long getTimestamp();
  byte getType();
  byte[] getValue();

  //plus an endless list of convenience methods:
  int getKeyLength();
  KeyValue getKeyValue();
  boolean isDelete();
  //etc, etc (or put these in sub-interface)
}

We could start by making KeyValue implement that interface and then slowly
change pieces of the code base to use HCell.  That will allow us to start
elegantly working in different implementations.
PtKeyValue<https://github.com/hotpads/hbase-prefix-trie/blob/master/src/org/apache/hadoop/hbase/keyvalue/trie/compact/read/PtKeyValue.java>would
be one of them.  During the transition, you can always call
PtKeyValue.getCopiedKeyValue() which will instantiate a new byte[] in the
traditional KeyValue format.

We'd also want an interface for HFileBlock, and a few others...

Some of this stuff is overwhelming to think about in parallel with the
existing hbase code, but it's actually not very complicated from a
standalone perspective.  If you can isolate it into a module behind an
interface, then it's just a bunch of converting things to bytes and back.
 There are (hopefully) no exceptions, gc pauses, cascading failures, etc...
all the things that are hard to handle to begin with and especially time
consuming to debug, emulate, and write tests for.  There's not even
multi-threading!  It's pretty easy to write tests for it and then never look
at it again.

Matt

On Fri, Sep 16, 2011 at 6:08 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote:

> Hey this stuff looks really interesting!
>
> On the ByteBuffer, the 'array' byte[] access to the underlying data is
> totally incompatible with the 'off heap' features that are implemented
> by DirectByteBuffer.  While people talk about DBB in terms of nio
> performance, if you have to roundtrip the data thru java code, I'm not
> sure it buys you much - you still need to move data in and out of the
> main Java heap.  Typically this is geared more towards apps which read
> and write from/to socket/files with minimal processing.
>
> While in the past I have been pretty bullish on off-heap caching for
> HBase, I have since changed my mind due to the poor API (ByteBuffer is
> a sucky way to access data structures in ram), and other reasons (ping
> me off list if you want).  The KeyValue code pretty much presumes that
> data is in byte[] anyways, and I had thought that even with off-heap
> caching, we'd still have to copy KeyValues into main-heap during
> scanning anyways.
>
> Given the minimal size of the hfile blocks, I really dont see an issue
> with buffering a block output - especially if the savings is fairly
> substantial.
>
> Thanks,
> -ryan
>
> On Fri, Sep 16, 2011 at 5:48 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:
> > Jacek,
> >
> > Thanks for helping out with this.  I implemented most of the DeltaEncoder
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB