Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - prefix compression

Copy link to this message
RE: prefix compression
Jonathan Gray 2011-06-02, 23:44
I'm here!

Still parsing through the stuff in this e-mail but I agree that many approaches will require rejiggering of KeyValue in a significant way.  I have made some attempts at this but nothing is very far.  It's definitely something that we are hoping to put some resources into over the summer.


> -----Original Message-----
> From: Todd Lipcon [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, June 02, 2011 3:37 PM
> Subject: Re: prefix compression
> Hey Matt,
> Interesting email, and also something I've been thinking about recently.
> Unfortunately, I think one of the big prerequisites before we can start
> thinking about actual compression algorithms is some refactoring around
> how KeyValue is used. Currently, KeyValue exposes byte arrays in a lot of
> places
> - these byte arrays are expected to have discrete sub-ranges that represent
> row key, column qualifier, etc. Various parts of the code directly grab these
> byte arrays and offsets and do stuff with them (eg comparison, serialization,
> etc).
> In order to split the different components up, or do true prefix compression,
> we'd need to be able to have KeyValue do some smarter things for
> comparison, and never call functions like "getBuffer), getRowKeyOffset(),
> getRowKeyLength()" expecting that to show us a byte array range with a row
> key in it.
> I think Jonathan Gray has some code hacked together for this, but last I
> heard it wasn't in shippable state... Jonathan, you out there?
> As for the particular compression algorithms, a nice first step would be a very
> simplistic one: the first row key of any HFile block would be given in full.
> Following that, each key uses a single byte token which identifies the
> number of key components shared with the previous row (ie same row,
> same col, different ts). This isn't quite as good as true prefix compression,
> but for common cases where each cell has three versions, and each row has
> many columns, it's probably a huge savings.
> -Todd
> On Thu, Jun 2, 2011 at 3:28 PM, Matt Corgan <[EMAIL PROTECTED]>
> wrote:
> > I've been reading up on prefix compression for a few weeks trying to
> > figure out the best approach.  Maybe some of you have noticed that 90%
> > of my emails to the user list have something to do with it...  Now
> > that people are working on HFile v2 and the topic of Lucene's FST has
> > come up, I thought it would be a good time to share some thoughts.  My
> > goal here is definitely not asking anyone to implement this, but just
> > to start a discussion that gets the proper jiras in place.  I have
> > spent a little time coding an initial rowKey trie builder that could
> > be the start of a pluggable implementation.
> >
> > This is a big topic to address in one email, but I don't think the
> > pieces make sense on their own.  Here's an outline with some thoughts:
> >
> > * refer to prefix compression as "compaction" to avoid interfering
> > with traditional "compression"
> >
> > * main goal is to reduce size of data in memory to increase effective
> > capacity of block index, block cache, and memstores
> >    * disk access takes thousands of times longer than memory access,
> > especially over hdfs/network
> >
> > * secondary goal is to speed up the processing of in memory data
> >    * memory access takes 200+ cpu cycles, so must use cache friendly
> > algorithms
> >    * structure trie node sizes around typical 64B cache line size
> >    * investigate java's memory prefetching capabilities, possibly
> > increasing to ~256B nodes
> >        * research
> > paper<
> >
> http://www.google.com/url?sa=t&source=web&cd=5&sqi=2&ved=0CDcQFj
> AE&url
> >
> =http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3
> D10.1.
> >
> 1.68.4872%26rep%3Drep1%26type%3Dpdf&rct=j&q=databases%20prefetchi
> ng&ei
> > =7tnnTcSWLuXTiALSpNCUDA&usg=AFQjCNFpVC9oDsG1-
> UG0x6ZsqsIHLqvGDA&sig2=o0
> > 96l8cNrnX69oeBGChUKg
> > >about
> > modern memory databases and prefetching