|
Matt Corgan
2011-09-13, 07:44
Ted Yu
2011-09-13, 08:36
Jacek Migdal
2011-09-14, 22:43
Matt Corgan
2011-09-17, 00:48
Ryan Rawson
2011-09-17, 01:08
Matt Corgan
2011-09-17, 01:47
Ryan Rawson
2011-09-17, 02:08
Matt Corgan
2011-09-17, 02:29
Ryan Rawson
2011-09-17, 02:34
Matt Corgan
2011-09-19, 22:26
Stack
2011-09-20, 05:33
Ryan Rawson
2011-09-20, 05:37
Stack
2011-09-20, 05:39
Ryan Rawson
2011-09-20, 05:41
Matt Corgan
2011-09-20, 17:59
Jacek Migdal
2011-09-20, 23:58
Matt Corgan
2011-09-21, 01:04
Jacek Migdal
2011-09-22, 00:23
|
-
prefix compression implementationMatt Corgan 2011-09-13, 07:44
Hi devs,
I put a "developer preview" of a prefix compression algorithm on github. It still needs some details worked out, a full set of iterators, about 200 optimizations, and a bunch of other stuff... but, it successfully passes some preliminary tests so I thought I'd get it in front of more eyeballs sooner than later. https://github.com/hotpads/hbase-prefix-trie It depends on HBase's Bytes.java and KeyValue.java, which depends on hadoop. Those jars are in there, along with some others for full HFile testing in the near future. A good place to start looking at the code is org.apache.hadoop.hbase.keyvalue.trie.builder.KeyValuePtBuilder. It's the main driver of the compaction side of things, taking KeyValues (in sorted order), and generates a byte[] to be saved as a disk block. Then for reading it back in, there is trie.compact.read.PtIterator which takes the byte[] and spits out KeyValues. The main test class is trie.test.row.RowBuilderTests which round-trips a bunch of KeyValues to make sure the outputs are the same as the inputs. trie.compact.row.node.PtRowNode is the format of a single trie node in the underlying byte[]. The current version generates a lot of little objects (garbage) while writing and reading. I plan to optimize it to the point where most variables are primitives on the stack (at least when reading), but I think these intermediate classes are important for debugging. I'll probably try to keep them around going forward and develop a more compact version in parallel. It uses trie style compression for the row keys and column qualifiers, where pointers between nodes are compacted ints. It keeps a list of compacted, de-duped deltas for timestamps, and if they're all the same, it stores only one (long) per block. If all KeyValue.TYPE operations are the same, then it only stores one (byte) per block. It's designed around efficient cpu cache usage and elimination of 8 byte pointers, so should be fast. Get calls can traverse the trie nodes to dig up a row key while barely loading anything from memory to cache, as opposed to current hbase which may load the better part of a block into cache. Scanners/Filters/Comparators can all be designed to be trie-aware so they can iterate through 20 columnQualifiers in the same row without constantly re-scanning the rowKey bytes... etc... Here are a few questions we'll need to answer at some point: * where should this code go? - i'd vote for keeping it isolated and limiting references back to the main hbase project. sort of like the gzip/lzo algorithms. - if it's strictly isolated, it'll be easier to keep it well tested for correctness/performance and let more people tinker with it to make it faster. it'll also force us to come up with the right interfaces to allow other compression implementations. * current HFileWriter sends KeyValues to the OutputStream as soon as they're processed, but would it be ok to queue up a whole block in memory and write it all at once? - i'd vote for yes. it makes it easier to arrange the data to be more read-friendly. also, we're only talking about one block of data which is presumably a fraction of the block cache's size * should the block bytes be accessed as a byte[] of ByteBuffer. i know there's been work on off-heap cache, but i've read the blocks are pulled on-heap before they're parsed (??) - see org.apache.hadoop.hbase.keyvalue.trie.profile.MemoryAccessProfiler for a comparison of byte[] vs ByteBuffer speed tests. ByteBuffer looks ~2-10x slower, but some people need the off-heap cache - i'll propose maintaining separate reading algorithms for each, given that the underlying bytes are in the exact same format. should be possible to copy/paste the code and replace bytes[i] with ByteBuffer.get(i), and create parallel versions of static util methods * each block has some metadata wrapped in a class called PtBlockMeta. does HBase currently have a way to store its values as java primitives on the heap rather than parsing them out of the byte[]/ByteBuffer on every access? if they have to be parsed out on every block access, that could take more time than the Get query it's trying to perform. - I know there's a new Cachable interface or something like that. maybe that already supports it or could be enhanced What jiras do you think i should make? Look forward to hearing people's feedback, Matt
-
Re: prefix compression implementationTed Yu 2011-09-13, 08:36
Matt:
Thanks for the update. Cacheable interface is defined in: src/main/java/org/apache/hadoop/hbase/io/hfile/Cacheable.java You can find the implementation at: src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java I will browse your code later. On Tue, Sep 13, 2011 at 12:44 AM, Matt Corgan <[EMAIL PROTECTED]> wrote: > Hi devs, > > I put a "developer preview" of a prefix compression algorithm on github. > It > still needs some details worked out, a full set of iterators, about 200 > optimizations, and a bunch of other stuff... but, it successfully passes > some preliminary tests so I thought I'd get it in front of more eyeballs > sooner than later. > > https://github.com/hotpads/hbase-prefix-trie > > It depends on HBase's Bytes.java and KeyValue.java, which depends on > hadoop. > Those jars are in there, along with some others for full HFile testing in > the near future. > > A good place to start looking at the code > is org.apache.hadoop.hbase.keyvalue.trie.builder.KeyValuePtBuilder. It's > the main driver of the compaction side of things, taking KeyValues (in > sorted order), and generates a byte[] to be saved as a disk block. Then > for > reading it back in, there is trie.compact.read.PtIterator which takes the > byte[] and spits out KeyValues. The main test class is > trie.test.row.RowBuilderTests which round-trips a bunch of KeyValues to > make > sure the outputs are the same as the inputs. > trie.compact.row.node.PtRowNode is the format of a single trie node in the > underlying byte[]. > > The current version generates a lot of little objects (garbage) while > writing and reading. I plan to optimize it to the point where most > variables are primitives on the stack (at least when reading), but I think > these intermediate classes are important for debugging. I'll probably try > to keep them around going forward and develop a more compact version in > parallel. > > It uses trie style compression for the row keys and column qualifiers, > where > pointers between nodes are compacted ints. It keeps a list of compacted, > de-duped deltas for timestamps, and if they're all the same, it stores only > one (long) per block. If all KeyValue.TYPE operations are the same, then > it > only stores one (byte) per block. > > It's designed around efficient cpu cache usage and elimination of 8 byte > pointers, so should be fast. Get calls can traverse the trie nodes to dig > up a row key while barely loading anything from memory to cache, as opposed > to current hbase which may load the better part of a block into cache. > Scanners/Filters/Comparators can all be designed to be trie-aware so they > can iterate through 20 columnQualifiers in the same row without constantly > re-scanning the rowKey bytes... etc... > > > Here are a few questions we'll need to answer at some point: > > * where should this code go? > - i'd vote for keeping it isolated and limiting references back to the > main hbase project. sort of like the gzip/lzo algorithms. > - if it's strictly isolated, it'll be easier to keep it well tested for > correctness/performance and let more people tinker with it to make it > faster. it'll also force us to come up with the right interfaces to allow > other compression implementations. > > * current HFileWriter sends KeyValues to the OutputStream as soon as > they're > processed, but would it be ok to queue up a whole block in memory and write > it all at once? > - i'd vote for yes. it makes it easier to arrange the data to be more > read-friendly. also, we're only talking about one block of data which is > presumably a fraction of the block cache's size > > * should the block bytes be accessed as a byte[] of ByteBuffer. i know > there's been work on off-heap cache, but i've read the blocks are pulled > on-heap before they're parsed (??) > - see org.apache.hadoop.hbase.keyvalue.trie.profile.MemoryAccessProfiler > for a comparison of byte[] vs ByteBuffer speed tests. ByteBuffer looks > ~2-10x slower, but some people need the off-heap cache
-
Re: prefix compression implementationJacek Migdal 2011-09-14, 22:43
Matt,
Thanks a lot for the code. Great job! As I mentioned in JIRA I work full time on the delta encoding [1]. Right now the code and integration is almost done. Most of the parts are under review. Since it is a big change will plan to test it very carefully. After that, It will be ported to trunk and open sourced. I have a quick glimpse I have taken the different approach. I implemented a few different algorithms which are simpler. They also aims mostly to save space while having fast decompress/compress code. However the access is still sequential. The goal of my project is to save some RAM by having compressed BlockCache in memory. On the other hand, it seems that you are most concerned about seeking performance. I will read your code more carefully. A quick glimpse: we both implemented some routines (like vint), but expect that there is no overlap. I also seen that you spend some time investigating ByteBuffer vs. Byte[]. I experienced significant negative performance impact when I switched to ByteBuffer. However I postpone this optimization. Right now I think the easiest way to go would be that you will implement DeltaEncoder interface after my change: http://pastebin.com/Y8UxUByG (note, there might be some minor changes) That way, you will reuse my integration with existing code for "free". Jacek [1] - I prefer to call it that way. Prefix is one of the algorithm, but there are also different approach. On 9/13/11 1:36 AM, "Ted Yu" <[EMAIL PROTECTED]> wrote: >Matt: >Thanks for the update. >Cacheable interface is defined in: >src/main/java/org/apache/hadoop/hbase/io/hfile/Cacheable.java > >You can find the implementation at: >src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java > >I will browse your code later. > >On Tue, Sep 13, 2011 at 12:44 AM, Matt Corgan <[EMAIL PROTECTED]> wrote: > >> Hi devs, >> >> I put a "developer preview" of a prefix compression algorithm on github. >> It >> still needs some details worked out, a full set of iterators, about 200 >> optimizations, and a bunch of other stuff... but, it successfully >>passes >> some preliminary tests so I thought I'd get it in front of more eyeballs >> sooner than later. >> >> https://github.com/hotpads/hbase-prefix-trie >> >> It depends on HBase's Bytes.java and KeyValue.java, which depends on >> hadoop. >> Those jars are in there, along with some others for full HFile testing >>in >> the near future. >> >> A good place to start looking at the code >> is org.apache.hadoop.hbase.keyvalue.trie.builder.KeyValuePtBuilder. >>It's >> the main driver of the compaction side of things, taking KeyValues (in >> sorted order), and generates a byte[] to be saved as a disk block. Then >> for >> reading it back in, there is trie.compact.read.PtIterator which takes >>the >> byte[] and spits out KeyValues. The main test class is >> trie.test.row.RowBuilderTests which round-trips a bunch of KeyValues to >> make >> sure the outputs are the same as the inputs. >> trie.compact.row.node.PtRowNode is the format of a single trie node in >>the >> underlying byte[]. >> >> The current version generates a lot of little objects (garbage) while >> writing and reading. I plan to optimize it to the point where most >> variables are primitives on the stack (at least when reading), but I >>think >> these intermediate classes are important for debugging. I'll probably >>try >> to keep them around going forward and develop a more compact version in >> parallel. >> >> It uses trie style compression for the row keys and column qualifiers, >> where >> pointers between nodes are compacted ints. It keeps a list of >>compacted, >> de-duped deltas for timestamps, and if they're all the same, it stores >>only >> one (long) per block. If all KeyValue.TYPE operations are the same, >>then >> it >> only stores one (byte) per block. >> >> It's designed around efficient cpu cache usage and elimination of 8 byte >> pointers, so should be fast. Get calls can traverse the trie nodes to
-
Re: prefix compression implementationMatt Corgan 2011-09-17, 00:48
Jacek,
Thanks for helping out with this. I implemented most of the DeltaEncoder and DeltaEncoderSeeker. I haven't taken the time to generate a good set of test data for any of this, but it does pass on some very small input data that aims to cover the edge cases i can think of. Perhaps you have full HFiles you can run through it. https://github.com/hotpads/hbase-prefix-trie/tree/master/src/org/apache/hadoop/hbase/keyvalue/trie/deltaencoder I also put some notes on the PtDeltaEncoder regarding how the prefix trie should be optimally used. I can't think of a situation where you'd want to blow it up into the full uncompressed KeyValue ByteBuffer, so implementing the DeltaEncoder interface is a mismatch, but I realize it's only a starting point. You also would never really have a full ByteBuffer of KeyValues to pass to it for compression. Typically, you'd be passing individual KeyValues from the memstore flush or from a collection of HFiles being merged through a PriorityQueue. The end goal is to operate on the encoded trie without decompressing it. Long term, and in certain circumstances, it may even be possible to pass the compressed trie over the wire to the client who can then decode it. Let me know if I implemented that the way you had in mind. I haven't done the seekTo method yet, but will try to do that next week. Matt On Wed, Sep 14, 2011 at 3:43 PM, Jacek Migdal <[EMAIL PROTECTED]> wrote: > Matt, > > Thanks a lot for the code. Great job! > > As I mentioned in JIRA I work full time on the delta encoding [1]. Right > now the code and integration is almost done. Most of the parts are under > review. Since it is a big change will plan to test it very carefully. > After that, It will be ported to trunk and open sourced. > > I have a quick glimpse I have taken the different approach. I implemented > a few different algorithms which are simpler. They also aims mostly to > save space while having fast decompress/compress code. However the access > is still sequential. The goal of my project is to save some RAM by having > compressed BlockCache in memory. > > On the other hand, it seems that you are most concerned about seeking > performance. > > I will read your code more carefully. A quick glimpse: we both implemented > some routines (like vint), but expect that there is no overlap. > > I also seen that you spend some time investigating ByteBuffer vs. Byte[]. > I experienced significant negative performance impact when I switched to > ByteBuffer. However I postpone this optimization. > > Right now I think the easiest way to go would be that you will implement > DeltaEncoder interface after my change: > http://pastebin.com/Y8UxUByG > (note, there might be some minor changes) > > That way, you will reuse my integration with existing code for "free". > > Jacek > > [1] - I prefer to call it that way. Prefix is one of the algorithm, but > there are also different approach. > > On 9/13/11 1:36 AM, "Ted Yu" <[EMAIL PROTECTED]> wrote: > > >Matt: > >Thanks for the update. > >Cacheable interface is defined in: > >src/main/java/org/apache/hadoop/hbase/io/hfile/Cacheable.java > > > >You can find the implementation at: > >src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java > > > >I will browse your code later. > > > >On Tue, Sep 13, 2011 at 12:44 AM, Matt Corgan <[EMAIL PROTECTED]> > wrote: > > > >> Hi devs, > >> > >> I put a "developer preview" of a prefix compression algorithm on github. > >> It > >> still needs some details worked out, a full set of iterators, about 200 > >> optimizations, and a bunch of other stuff... but, it successfully > >>passes > >> some preliminary tests so I thought I'd get it in front of more eyeballs > >> sooner than later. > >> > >> https://github.com/hotpads/hbase-prefix-trie > >> > >> It depends on HBase's Bytes.java and KeyValue.java, which depends on > >> hadoop. > >> Those jars are in there, along with some others for full HFile testing > >>in > >> the near future. > >> > >> A good place to start looking at the code
-
Re: prefix compression implementationRyan Rawson 2011-09-17, 01:08
Hey this stuff looks really interesting!
On the ByteBuffer, the 'array' byte[] access to the underlying data is totally incompatible with the 'off heap' features that are implemented by DirectByteBuffer. While people talk about DBB in terms of nio performance, if you have to roundtrip the data thru java code, I'm not sure it buys you much - you still need to move data in and out of the main Java heap. Typically this is geared more towards apps which read and write from/to socket/files with minimal processing. While in the past I have been pretty bullish on off-heap caching for HBase, I have since changed my mind due to the poor API (ByteBuffer is a sucky way to access data structures in ram), and other reasons (ping me off list if you want). The KeyValue code pretty much presumes that data is in byte[] anyways, and I had thought that even with off-heap caching, we'd still have to copy KeyValues into main-heap during scanning anyways. Given the minimal size of the hfile blocks, I really dont see an issue with buffering a block output - especially if the savings is fairly substantial. Thanks, -ryan On Fri, Sep 16, 2011 at 5:48 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: > Jacek, > > Thanks for helping out with this. I implemented most of the DeltaEncoder > and DeltaEncoderSeeker. I haven't taken the time to generate a good set of > test data for any of this, but it does pass on some very small input data > that aims to cover the edge cases i can think of. Perhaps you have full > HFiles you can run through it. > > https://github.com/hotpads/hbase-prefix-trie/tree/master/src/org/apache/hadoop/hbase/keyvalue/trie/deltaencoder > > I also put some notes on the PtDeltaEncoder regarding how the prefix trie > should be optimally used. I can't think of a situation where you'd want to > blow it up into the full uncompressed KeyValue ByteBuffer, so implementing > the DeltaEncoder interface is a mismatch, but I realize it's only a starting > point. > > You also would never really have a full ByteBuffer of KeyValues to pass to > it for compression. Typically, you'd be passing individual KeyValues from > the memstore flush or from a collection of HFiles being merged through a > PriorityQueue. > > The end goal is to operate on the encoded trie without decompressing it. > Long term, and in certain circumstances, it may even be possible to pass > the compressed trie over the wire to the client who can then decode it. > > Let me know if I implemented that the way you had in mind. I haven't done > the seekTo method yet, but will try to do that next week. > > Matt > > On Wed, Sep 14, 2011 at 3:43 PM, Jacek Migdal <[EMAIL PROTECTED]> wrote: > >> Matt, >> >> Thanks a lot for the code. Great job! >> >> As I mentioned in JIRA I work full time on the delta encoding [1]. Right >> now the code and integration is almost done. Most of the parts are under >> review. Since it is a big change will plan to test it very carefully. >> After that, It will be ported to trunk and open sourced. >> >> I have a quick glimpse I have taken the different approach. I implemented >> a few different algorithms which are simpler. They also aims mostly to >> save space while having fast decompress/compress code. However the access >> is still sequential. The goal of my project is to save some RAM by having >> compressed BlockCache in memory. >> >> On the other hand, it seems that you are most concerned about seeking >> performance. >> >> I will read your code more carefully. A quick glimpse: we both implemented >> some routines (like vint), but expect that there is no overlap. >> >> I also seen that you spend some time investigating ByteBuffer vs. Byte[]. >> I experienced significant negative performance impact when I switched to >> ByteBuffer. However I postpone this optimization. >> >> Right now I think the easiest way to go would be that you will implement >> DeltaEncoder interface after my change: >> http://pastebin.com/Y8UxUByG >> (note, there might be some minor changes)
-
Re: prefix compression implementationMatt Corgan 2011-09-17, 01:47
I'm a little confused over the direction of the DBBs in general, hence the
lack of clarity in my code. I see value in doing fine-grained parsing of the DBB if you're going to have a large block of data and only want to retrieve a small KV from the middle of it. With this trie design, you can navigate your way through the DBB without copying hardly anything to the heap. It would be a shame blow away your entire L1 cache by loading a whole 256KB block onto heap if you only want to read 200 bytes out of the middle... it can be done ultra-efficiently. The problem is if you're going to iterate through an entire block made of 5000 small KV's doing thousands of DBB.get(index) calls. Those are like 10x slower than byte[index] calls. In that case, if it's a DBB, you want to copy the full block on-heap and access it through the byte[] interface. If it's a HeapBB, then you already have access to the underlying byte[]. So there's possibly value in implementing both methods. The main problem i see is a lack of interfaces in the current code base. I'll throw one suggestion out there as food for thought. Create a new interface: interface HCell{ byte[] getRow(); byte[] getFamily(); byte[] getQualifier(); long getTimestamp(); byte getType(); byte[] getValue(); //plus an endless list of convenience methods: int getKeyLength(); KeyValue getKeyValue(); boolean isDelete(); //etc, etc (or put these in sub-interface) } We could start by making KeyValue implement that interface and then slowly change pieces of the code base to use HCell. That will allow us to start elegantly working in different implementations. PtKeyValue<https://github.com/hotpads/hbase-prefix-trie/blob/master/src/org/apache/hadoop/hbase/keyvalue/trie/compact/read/PtKeyValue.java>would be one of them. During the transition, you can always call PtKeyValue.getCopiedKeyValue() which will instantiate a new byte[] in the traditional KeyValue format. We'd also want an interface for HFileBlock, and a few others... Some of this stuff is overwhelming to think about in parallel with the existing hbase code, but it's actually not very complicated from a standalone perspective. If you can isolate it into a module behind an interface, then it's just a bunch of converting things to bytes and back. There are (hopefully) no exceptions, gc pauses, cascading failures, etc... all the things that are hard to handle to begin with and especially time consuming to debug, emulate, and write tests for. There's not even multi-threading! It's pretty easy to write tests for it and then never look at it again. Matt On Fri, Sep 16, 2011 at 6:08 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > Hey this stuff looks really interesting! > > On the ByteBuffer, the 'array' byte[] access to the underlying data is > totally incompatible with the 'off heap' features that are implemented > by DirectByteBuffer. While people talk about DBB in terms of nio > performance, if you have to roundtrip the data thru java code, I'm not > sure it buys you much - you still need to move data in and out of the > main Java heap. Typically this is geared more towards apps which read > and write from/to socket/files with minimal processing. > > While in the past I have been pretty bullish on off-heap caching for > HBase, I have since changed my mind due to the poor API (ByteBuffer is > a sucky way to access data structures in ram), and other reasons (ping > me off list if you want). The KeyValue code pretty much presumes that > data is in byte[] anyways, and I had thought that even with off-heap > caching, we'd still have to copy KeyValues into main-heap during > scanning anyways. > > Given the minimal size of the hfile blocks, I really dont see an issue > with buffering a block output - especially if the savings is fairly > substantial. > > Thanks, > -ryan > > On Fri, Sep 16, 2011 at 5:48 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: > > Jacek, > > > > Thanks for helping out with this. I implemented most of the DeltaEncoder
-
Re: prefix compression implementationRyan Rawson 2011-09-17, 02:08
On Fri, Sep 16, 2011 at 6:47 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:
> I'm a little confused over the direction of the DBBs in general, hence the > lack of clarity in my code. > > I see value in doing fine-grained parsing of the DBB if you're going to have > a large block of data and only want to retrieve a small KV from the middle > of it. With this trie design, you can navigate your way through the DBB > without copying hardly anything to the heap. It would be a shame blow away > your entire L1 cache by loading a whole 256KB block onto heap if you only > want to read 200 bytes out of the middle... it can be done > ultra-efficiently. This paragraph is not factually correct. The DirectByteBuffer vs main heap has nothing to do with the CPU cache. Consider the following scenario: - read block from DFS - scan block in ram - prepare result set for client Pretty simple, we have a choice in step 1: - write to java heap - write to DirectByteBuffer off-heap controlled memory in either case, you are copying to memory, and therefore cycling thru the cpu cache (of course). The difference is whether the Java GC has to deal with the aftermath or not. So the question "DBB or not" is not one about CPU caches, but one about garbage collection. Of course, nothing is free, and dealing with DBB requires extensive in-situ bounds checking (look at the source code for that class!), and also requires manual memory management on the behalf of the programmer. So you are faced with an expensive API (getByte is not as good at an array get), and a lot more homework to do. I have decided it's not worth it personally and aren't chasing that line as a potential performance improvement, and I also would encourage you not to as well. Ultimately the DFS speed issues need to be solved by the DFS - HDFS needs more work, but alternatives are already there and are a lot faster. > > The problem is if you're going to iterate through an entire block made of > 5000 small KV's doing thousands of DBB.get(index) calls. Those are like 10x > slower than byte[index] calls. In that case, if it's a DBB, you want to > copy the full block on-heap and access it through the byte[] interface. If > it's a HeapBB, then you already have access to the underlying byte[]. Yes this is the issue - you have to take an extra copy one way or another. Doing effective prefix compression with DBB is not really feasible imo, and that's another reason why I have given up on DBBs. > > So there's possibly value in implementing both methods. The main problem i > see is a lack of interfaces in the current code base. I'll throw one > suggestion out there as food for thought. Create a new interface: > > interface HCell{ > byte[] getRow(); > byte[] getFamily(); > byte[] getQualifier(); > long getTimestamp(); > byte getType(); > byte[] getValue(); > > //plus an endless list of convenience methods: > int getKeyLength(); > KeyValue getKeyValue(); > boolean isDelete(); > //etc, etc (or put these in sub-interface) > } > > We could start by making KeyValue implement that interface and then slowly > change pieces of the code base to use HCell. That will allow us to start > elegantly working in different implementations. > PtKeyValue<https://github.com/hotpads/hbase-prefix-trie/blob/master/src/org/apache/hadoop/hbase/keyvalue/trie/compact/read/PtKeyValue.java>would > be one of them. During the transition, you can always call > PtKeyValue.getCopiedKeyValue() which will instantiate a new byte[] in the > traditional KeyValue format. I am not really super keen here, and while the interface of course makes plenty of sense, the issue is that you will need to turn an array of KeyValues (aka a Result instance) in to a bunch of bytes on the wire. So there HAS to be a method that returns a ByteBuffer that the IO layer can then use to write out (via scatter/gather type network APIs) to the wire. A better choice I think would be to remove this method: public byte [] getBuffer() then deal with the places that use this - there is a bunch, but nothing looks super impossible (ie: no interface changes to filters, etc). Once you have that, making multiple implementations of key value is easier. I'd rather that key value becomes the abstract base class, and subclasses implement concrete details. Yes, unit tests for basic logic is good, but ultimately hbase is an integrated whole, and the concurrency problems have been really tough to crack. Things are better than they have ever been, but still a lot of testing to do.
-
Re: prefix compression implementationMatt Corgan 2011-09-17, 02:29
Ryan - thanks for the feedback. The situation I'm thinking of where it's
useful to parse DirectBB without copying to heap is when you are serving small random values out of the block cache. At HotPads, we'd like to store hundreds of GB of real estate listing data in memory so it can be quickly served up at random. We want to access many small values that are already in memory, so basically skipping step 1 of 3 because values are already in memory. That being said, the DirectBB are not essential for us since we haven't run into gb problems, i just figured it would be nice to support them since they seem to be important to other people. My motivation for doing this is to make hbase a viable candidate for a large, auto-partitioned, sorted, *in-memory* database. Not the usual analytics use case, but i think hbase would be great for this. On Fri, Sep 16, 2011 at 7:08 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > On Fri, Sep 16, 2011 at 6:47 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: > > I'm a little confused over the direction of the DBBs in general, hence > the > > lack of clarity in my code. > > > > I see value in doing fine-grained parsing of the DBB if you're going to > have > > a large block of data and only want to retrieve a small KV from the > middle > > of it. With this trie design, you can navigate your way through the DBB > > without copying hardly anything to the heap. It would be a shame blow > away > > your entire L1 cache by loading a whole 256KB block onto heap if you only > > want to read 200 bytes out of the middle... it can be done > > ultra-efficiently. > > This paragraph is not factually correct. The DirectByteBuffer vs main > heap has nothing to do with the CPU cache. Consider the following > scenario: > > - read block from DFS > - scan block in ram > - prepare result set for client > > Pretty simple, we have a choice in step 1: > - write to java heap > - write to DirectByteBuffer off-heap controlled memory > > in either case, you are copying to memory, and therefore cycling thru > the cpu cache (of course). The difference is whether the Java GC has > to deal with the aftermath or not. > > So the question "DBB or not" is not one about CPU caches, but one > about garbage collection. Of course, nothing is free, and dealing > with DBB requires extensive in-situ bounds checking (look at the > source code for that class!), and also requires manual memory > management on the behalf of the programmer. So you are faced with an > expensive API (getByte is not as good at an array get), and a lot more > homework to do. I have decided it's not worth it personally and > aren't chasing that line as a potential performance improvement, and I > also would encourage you not to as well. > > Ultimately the DFS speed issues need to be solved by the DFS - HDFS > needs more work, but alternatives are already there and are a lot > faster. > > > > > > > > > The problem is if you're going to iterate through an entire block made of > > 5000 small KV's doing thousands of DBB.get(index) calls. Those are like > 10x > > slower than byte[index] calls. In that case, if it's a DBB, you want to > > copy the full block on-heap and access it through the byte[] interface. > If > > it's a HeapBB, then you already have access to the underlying byte[]. > > Yes this is the issue - you have to take an extra copy one way or > another. Doing effective prefix compression with DBB is not really > feasible imo, and that's another reason why I have given up on DBBs. > > > > > So there's possibly value in implementing both methods. The main problem > i > > see is a lack of interfaces in the current code base. I'll throw one > > suggestion out there as food for thought. Create a new interface: > > > > interface HCell{ > > byte[] getRow(); > > byte[] getFamily(); > > byte[] getQualifier(); > > long getTimestamp(); > > byte getType(); > > byte[] getValue(); > > > > //plus an endless list of convenience methods: > > int getKeyLength();
-
Re: prefix compression implementationRyan Rawson 2011-09-17, 02:34
On Fri, Sep 16, 2011 at 7:29 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:
> Ryan - thanks for the feedback. The situation I'm thinking of where it's > useful to parse DirectBB without copying to heap is when you are serving > small random values out of the block cache. At HotPads, we'd like to store > hundreds of GB of real estate listing data in memory so it can be quickly > served up at random. We want to access many small values that are already > in memory, so basically skipping step 1 of 3 because values are already in > memory. That being said, the DirectBB are not essential for us since we > haven't run into gb problems, i just figured it would be nice to support > them since they seem to be important to other people. > > My motivation for doing this is to make hbase a viable candidate for a > large, auto-partitioned, sorted, *in-memory* database. Not the usual > analytics use case, but i think hbase would be great for this. What exactly about the current system makes it not a viable candidate? > > > On Fri, Sep 16, 2011 at 7:08 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > >> On Fri, Sep 16, 2011 at 6:47 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: >> > I'm a little confused over the direction of the DBBs in general, hence >> the >> > lack of clarity in my code. >> > >> > I see value in doing fine-grained parsing of the DBB if you're going to >> have >> > a large block of data and only want to retrieve a small KV from the >> middle >> > of it. With this trie design, you can navigate your way through the DBB >> > without copying hardly anything to the heap. It would be a shame blow >> away >> > your entire L1 cache by loading a whole 256KB block onto heap if you only >> > want to read 200 bytes out of the middle... it can be done >> > ultra-efficiently. >> >> This paragraph is not factually correct. The DirectByteBuffer vs main >> heap has nothing to do with the CPU cache. Consider the following >> scenario: >> >> - read block from DFS >> - scan block in ram >> - prepare result set for client >> >> Pretty simple, we have a choice in step 1: >> - write to java heap >> - write to DirectByteBuffer off-heap controlled memory >> >> in either case, you are copying to memory, and therefore cycling thru >> the cpu cache (of course). The difference is whether the Java GC has >> to deal with the aftermath or not. >> >> So the question "DBB or not" is not one about CPU caches, but one >> about garbage collection. Of course, nothing is free, and dealing >> with DBB requires extensive in-situ bounds checking (look at the >> source code for that class!), and also requires manual memory >> management on the behalf of the programmer. So you are faced with an >> expensive API (getByte is not as good at an array get), and a lot more >> homework to do. I have decided it's not worth it personally and >> aren't chasing that line as a potential performance improvement, and I >> also would encourage you not to as well. >> >> Ultimately the DFS speed issues need to be solved by the DFS - HDFS >> needs more work, but alternatives are already there and are a lot >> faster. >> >> >> >> >> >> > >> > The problem is if you're going to iterate through an entire block made of >> > 5000 small KV's doing thousands of DBB.get(index) calls. Those are like >> 10x >> > slower than byte[index] calls. In that case, if it's a DBB, you want to >> > copy the full block on-heap and access it through the byte[] interface. >> If >> > it's a HeapBB, then you already have access to the underlying byte[]. >> >> Yes this is the issue - you have to take an extra copy one way or >> another. Doing effective prefix compression with DBB is not really >> feasible imo, and that's another reason why I have given up on DBBs. >> >> > >> > So there's possibly value in implementing both methods. The main problem >> i >> > see is a lack of interfaces in the current code base. I'll throw one >> > suggestion out there as food for thought. Create a new interface:
-
Re: prefix compression implementationMatt Corgan 2011-09-19, 22:26
Ryan - i answered your question on another thread yesterday. Will use this
thread to continue conversation on the KeyValue interface. I don't think the name is all that important, though i thought HCell was less clumsy than KeyValue or KeyValueInterface. Take a look at this interface on github: https://github.com/hotpads/hbase-prefix-trie/blob/master/src/org/apache/hadoop/hbase/model/HCell.java Seems like it should be trivially easy to get KeyValue to implement that. Then it provides the right methods to make compareTo methods that will work across different implementations. The implementations of those methods might have an if-statement to determine the class of the "other" HCell, and choose the fastest byte comparison method behind the scenes. I need to look into the KeyValue scanner interfaces On Fri, Sep 16, 2011 at 7:34 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > On Fri, Sep 16, 2011 at 7:29 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: > > Ryan - thanks for the feedback. The situation I'm thinking of where it's > > useful to parse DirectBB without copying to heap is when you are serving > > small random values out of the block cache. At HotPads, we'd like to > store > > hundreds of GB of real estate listing data in memory so it can be quickly > > served up at random. We want to access many small values that are > already > > in memory, so basically skipping step 1 of 3 because values are already > in > > memory. That being said, the DirectBB are not essential for us since we > > haven't run into gb problems, i just figured it would be nice to support > > them since they seem to be important to other people. > > > > My motivation for doing this is to make hbase a viable candidate for a > > large, auto-partitioned, sorted, *in-memory* database. Not the usual > > analytics use case, but i think hbase would be great for this. > > What exactly about the current system makes it not a viable candidate? > > > > > > > > > > > On Fri, Sep 16, 2011 at 7:08 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > > > >> On Fri, Sep 16, 2011 at 6:47 PM, Matt Corgan <[EMAIL PROTECTED]> > wrote: > >> > I'm a little confused over the direction of the DBBs in general, hence > >> the > >> > lack of clarity in my code. > >> > > >> > I see value in doing fine-grained parsing of the DBB if you're going > to > >> have > >> > a large block of data and only want to retrieve a small KV from the > >> middle > >> > of it. With this trie design, you can navigate your way through the > DBB > >> > without copying hardly anything to the heap. It would be a shame blow > >> away > >> > your entire L1 cache by loading a whole 256KB block onto heap if you > only > >> > want to read 200 bytes out of the middle... it can be done > >> > ultra-efficiently. > >> > >> This paragraph is not factually correct. The DirectByteBuffer vs main > >> heap has nothing to do with the CPU cache. Consider the following > >> scenario: > >> > >> - read block from DFS > >> - scan block in ram > >> - prepare result set for client > >> > >> Pretty simple, we have a choice in step 1: > >> - write to java heap > >> - write to DirectByteBuffer off-heap controlled memory > >> > >> in either case, you are copying to memory, and therefore cycling thru > >> the cpu cache (of course). The difference is whether the Java GC has > >> to deal with the aftermath or not. > >> > >> So the question "DBB or not" is not one about CPU caches, but one > >> about garbage collection. Of course, nothing is free, and dealing > >> with DBB requires extensive in-situ bounds checking (look at the > >> source code for that class!), and also requires manual memory > >> management on the behalf of the programmer. So you are faced with an > >> expensive API (getByte is not as good at an array get), and a lot more > >> homework to do. I have decided it's not worth it personally and > >> aren't chasing that line as a potential performance improvement, and I > >> also would encourage you not to as well.
-
Re: prefix compression implementationStack 2011-09-20, 05:33
On Mon, Sep 19, 2011 at 3:26 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:
> I don't think the name is all that important, though i thought HCell was > less clumsy than KeyValue or KeyValueInterface. Take a look at this > interface on github: > > https://github.com/hotpads/hbase-prefix-trie/blob/master/src/org/apache/hadoop/hbase/model/HCell.java > > Seems like it should be trivially easy to get KeyValue to implement that. > Then it provides the right methods to make compareTo methods that will work > across different implementations. The implementations of those methods > might have an if-statement to determine the class of the "other" HCell, and > choose the fastest byte comparison method behind the scenes. > I'd say call it Cell rather than HCell. You have getRowArray rather than getRow which we currently have but I suppose it makes sense since you can then group by suffix. There is a patch lying around that adds a version to KV by using top two bytes of the type byte. If you need me to dig it up, just say (then you might not have to have v1 stuff in your Interface). You might need to add some equals for stuff like same row, cf, and qualifier... but they can come later. The comparator stuff is currently horrid because it depends on context; i.e. whether the KVs are from -ROOT- or .META. or from a userspace table. There are some ideas for having it so only one comparator for all types but thats another issue. St.Ack
-
Re: prefix compression implementationRyan Rawson 2011-09-20, 05:37
I was just pushing back at the idea of 'turn everything into
interfaces! problem solved!', and thinking about what was really necessary to get to where you want to go... On Mon, Sep 19, 2011 at 3:26 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: > Ryan - i answered your question on another thread yesterday. Will use this > thread to continue conversation on the KeyValue interface. > > I don't think the name is all that important, though i thought HCell was > less clumsy than KeyValue or KeyValueInterface. Take a look at this > interface on github: > > https://github.com/hotpads/hbase-prefix-trie/blob/master/src/org/apache/hadoop/hbase/model/HCell.java > > Seems like it should be trivially easy to get KeyValue to implement that. > Then it provides the right methods to make compareTo methods that will work > across different implementations. The implementations of those methods > might have an if-statement to determine the class of the "other" HCell, and > choose the fastest byte comparison method behind the scenes. > > I need to look into the KeyValue scanner interfaces > > > On Fri, Sep 16, 2011 at 7:34 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > >> On Fri, Sep 16, 2011 at 7:29 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: >> > Ryan - thanks for the feedback. The situation I'm thinking of where it's >> > useful to parse DirectBB without copying to heap is when you are serving >> > small random values out of the block cache. At HotPads, we'd like to >> store >> > hundreds of GB of real estate listing data in memory so it can be quickly >> > served up at random. We want to access many small values that are >> already >> > in memory, so basically skipping step 1 of 3 because values are already >> in >> > memory. That being said, the DirectBB are not essential for us since we >> > haven't run into gb problems, i just figured it would be nice to support >> > them since they seem to be important to other people. >> > >> > My motivation for doing this is to make hbase a viable candidate for a >> > large, auto-partitioned, sorted, *in-memory* database. Not the usual >> > analytics use case, but i think hbase would be great for this. >> >> What exactly about the current system makes it not a viable candidate? >> >> >> >> >> >> > >> > >> > On Fri, Sep 16, 2011 at 7:08 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >> > >> >> On Fri, Sep 16, 2011 at 6:47 PM, Matt Corgan <[EMAIL PROTECTED]> >> wrote: >> >> > I'm a little confused over the direction of the DBBs in general, hence >> >> the >> >> > lack of clarity in my code. >> >> > >> >> > I see value in doing fine-grained parsing of the DBB if you're going >> to >> >> have >> >> > a large block of data and only want to retrieve a small KV from the >> >> middle >> >> > of it. With this trie design, you can navigate your way through the >> DBB >> >> > without copying hardly anything to the heap. It would be a shame blow >> >> away >> >> > your entire L1 cache by loading a whole 256KB block onto heap if you >> only >> >> > want to read 200 bytes out of the middle... it can be done >> >> > ultra-efficiently. >> >> >> >> This paragraph is not factually correct. The DirectByteBuffer vs main >> >> heap has nothing to do with the CPU cache. Consider the following >> >> scenario: >> >> >> >> - read block from DFS >> >> - scan block in ram >> >> - prepare result set for client >> >> >> >> Pretty simple, we have a choice in step 1: >> >> - write to java heap >> >> - write to DirectByteBuffer off-heap controlled memory >> >> >> >> in either case, you are copying to memory, and therefore cycling thru >> >> the cpu cache (of course). The difference is whether the Java GC has >> >> to deal with the aftermath or not. >> >> >> >> So the question "DBB or not" is not one about CPU caches, but one >> >> about garbage collection. Of course, nothing is free, and dealing >> >> with DBB requires extensive in-situ bounds checking (look at the >> >> source code for that class!), and also requires manual memory
-
Re: prefix compression implementationStack 2011-09-20, 05:39
One other thought is that exposing ByteRange, ByteBuffer, and v1 array
stuff in Interface seems like you are exposing 'implementation' details that perhaps shouldn't show through. I'm guessing its unavoidable though if the Interface is to be used in a few different contexts: i.e. "v1" has to work if we are to get this new stuff in, some srcs will be DBBs, etc. St.Ack On Mon, Sep 19, 2011 at 10:33 PM, Stack <[EMAIL PROTECTED]> wrote: > On Mon, Sep 19, 2011 at 3:26 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: >> I don't think the name is all that important, though i thought HCell was >> less clumsy than KeyValue or KeyValueInterface. Take a look at this >> interface on github: >> >> https://github.com/hotpads/hbase-prefix-trie/blob/master/src/org/apache/hadoop/hbase/model/HCell.java >> >> Seems like it should be trivially easy to get KeyValue to implement that. >> Then it provides the right methods to make compareTo methods that will work >> across different implementations. The implementations of those methods >> might have an if-statement to determine the class of the "other" HCell, and >> choose the fastest byte comparison method behind the scenes. >> > > I'd say call it Cell rather than HCell. > > You have getRowArray rather than getRow which we currently have but I > suppose it makes sense since you can then group by suffix. > > There is a patch lying around that adds a version to KV by using top > two bytes of the type byte. If you need me to dig it up, just say > (then you might not have to have v1 stuff in your Interface). > > You might need to add some equals for stuff like same row, cf, and > qualifier... but they can come later. > > The comparator stuff is currently horrid because it depends on > context; i.e. whether the KVs are from -ROOT- or .META. or from a > userspace table. There are some ideas for having it so only one > comparator for all types but thats another issue. > > St.Ack >
-
Re: prefix compression implementationRyan Rawson 2011-09-20, 05:41
So if the HCell or whatever ends up returning ByteBuffers, then that
plays straight in to scatter/gather NIO calls, and if some of them are DBB, then so much the merrier. For example, the thrift stuff takes ByteBuffers when its calling for a byte sequence. -ryan On Mon, Sep 19, 2011 at 10:39 PM, Stack <[EMAIL PROTECTED]> wrote: > One other thought is that exposing ByteRange, ByteBuffer, and v1 array > stuff in Interface seems like you are exposing 'implementation' > details that perhaps shouldn't show through. I'm guessing its > unavoidable though if the Interface is to be used in a few different > contexts: i.e. "v1" has to work if we are to get this new stuff in, > some srcs will be DBBs, etc. > > St.Ack > > On Mon, Sep 19, 2011 at 10:33 PM, Stack <[EMAIL PROTECTED]> wrote: >> On Mon, Sep 19, 2011 at 3:26 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: >>> I don't think the name is all that important, though i thought HCell was >>> less clumsy than KeyValue or KeyValueInterface. Take a look at this >>> interface on github: >>> >>> https://github.com/hotpads/hbase-prefix-trie/blob/master/src/org/apache/hadoop/hbase/model/HCell.java >>> >>> Seems like it should be trivially easy to get KeyValue to implement that. >>> Then it provides the right methods to make compareTo methods that will work >>> across different implementations. The implementations of those methods >>> might have an if-statement to determine the class of the "other" HCell, and >>> choose the fastest byte comparison method behind the scenes. >>> >> >> I'd say call it Cell rather than HCell. >> >> You have getRowArray rather than getRow which we currently have but I >> suppose it makes sense since you can then group by suffix. >> >> There is a patch lying around that adds a version to KV by using top >> two bytes of the type byte. If you need me to dig it up, just say >> (then you might not have to have v1 stuff in your Interface). >> >> You might need to add some equals for stuff like same row, cf, and >> qualifier... but they can come later. >> >> The comparator stuff is currently horrid because it depends on >> context; i.e. whether the KVs are from -ROOT- or .META. or from a >> userspace table. There are some ideas for having it so only one >> comparator for all types but thats another issue. >> >> St.Ack >> >
-
Re: prefix compression implementationMatt Corgan 2011-09-20, 17:59
bringing all questions into a single email:
stack >> I'd say call it Cell rather than HCell. i did think the H was a very simple way to add uniqueness, like isn't "HFile" a big win over "File"? there are already two other classes called "Cell" in hbase (guava and REST gateway). another option could be KV, though i don't like making exceptions to java's no-abbreviations guidelines. stack >> You have getRowArray rather than getRow which we currently have but I suppose it makes sense since you can then group by suffix. i guess the point is to emphasize that those are low performance methods that shouldn't normally be called stack >> There is a patch lying around that adds a version to KV by using top two bytes of the type byte. If you need me to dig it up, just say (then you might not have to have v1 stuff in your Interface). not sure what you mean here. top two bits? you mean encoding the timestamp inside the type byte? stack >> You might need to add some equals for stuff like same row, cf, and qualifier... but they can come later. i've got some equals methods at the bottom. maybe you skimmed over those, or do you mean something different than those? stack >> The comparator stuff is currently horrid because it depends on context; i.e. whether the KVs are from -ROOT- or .META. or from a userspace table. There are some ideas for having it so only one comparator for all types but thats another issue. interesting. i wasn't aware of any of that. guess that's why i'm throwing all these ideas out there before going any further ryan >> I was just pushing back at the idea of 'turn everything into interfaces! problem solved!', and thinking about what was really necessary to get to where you want to go... gotcha. i don't think it's a good idea to roll out the interface over the entire code base any time soon. i just think it's inevitable that we make an interface at some point, and that the prefix trie would be so much easier if programming to a clean interface. stack >> One other thought is that exposing ByteRange, ByteBuffer, and v1 array stuff in Interface seems like you are exposing 'implementation' details that perhaps shouldn't show through. I'm guessing its unavoidable though if the Interface is to be used in a few different contexts: i.e. "v1" has to work if we are to get this new stuff in, some srcs will be DBBs, etc. true, it's implementation details, but important for performance. the interface in this case is a balance between clean code and performance. cleanest code would leave the interface with only those top getXArray() methods, but performance requires all the other methods. i'm really just throwing it out there for brainstorming purposes. for example, i haven't really though through whether that ByteRange thing is a good idea. maybe we should just be using ByteBuffer.wrap(byte[]). let's discuss in another email chain if anyone has comments on ByteRange ryan >> So if the HCell or whatever ends up returning ByteBuffers, then that plays straight in to scatter/gather NIO calls, and if some of them are DBB, then so much the merrier. For example, the thrift stuff takes ByteBuffers when its calling for a byte sequence. i'm going to start a new thread for this question too. i have some questions about ByteBuffer usage outside the off-heap cache On Mon, Sep 19, 2011 at 10:41 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > So if the HCell or whatever ends up returning ByteBuffers, then that > plays straight in to scatter/gather NIO calls, and if some of them are > DBB, then so much the merrier. > > For example, the thrift stuff takes ByteBuffers when its calling for a > byte sequence. > > -ryan > > On Mon, Sep 19, 2011 at 10:39 PM, Stack <[EMAIL PROTECTED]> wrote: > > One other thought is that exposing ByteRange, ByteBuffer, and v1 array > > stuff in Interface seems like you are exposing 'implementation' > > details that perhaps shouldn't show through. I'm guessing its > > unavoidable though if the Interface is to be used in a few different
-
Re: prefix compression implementationJacek Migdal 2011-09-20, 23:58
On 9/20/11 10:59 AM, "Matt Corgan" <[EMAIL PROTECTED]> wrote: >bringing all questions into a single email: > >stack >> I'd say call it Cell rather than HCell. > >i did think the H was a very simple way to add uniqueness, like isn't >"HFile" a big win over "File"? there are already two other classes called >"Cell" in hbase (guava and REST gateway). another option could be KV, >though i don't like making exceptions to java's no-abbreviations >guidelines. KeyValueCell? To be honest, no name seems to be a very good option. However, it would be nice if it would be somewhat related to KeyValue. On large scope, it would be hard to integrate this interface anytime soon. I would rather do it later. >stack >> There is a patch lying around that adds a version to KV by using >top >two bytes of the type byte. If you need me to dig it up, just say >(then you might not have to have v1 stuff in your Interface). > >not sure what you mean here. top two bits? you mean encoding the >timestamp >inside the type byte? Versioning KeyValue per KeyValue seems to be crazy. Shouldn't it be per block or file. >(interface discussion) > It is a huge chance. It would be great if we could prototype a few things. Especially I would like to avoid any optimizations before we know a got way to measure them. Jacek
-
Re: prefix compression implementationMatt Corgan 2011-09-21, 01:04
jacek >> It is a huge chance. It would be great if we could prototype a few
things. Especially I would like to avoid any optimizations before we know a got way to measure them. matt >> agree. i'm not in a rush to get any of this integrated, just trying to feel out the right long-term strategy. do you have unit tests that you're running on a substantial amount of data to compare different implementations? On Tue, Sep 20, 2011 at 4:58 PM, Jacek Migdal <[EMAIL PROTECTED]> wrote: > > > On 9/20/11 10:59 AM, "Matt Corgan" <[EMAIL PROTECTED]> wrote: > > >bringing all questions into a single email: > > > >stack >> I'd say call it Cell rather than HCell. > > > >i did think the H was a very simple way to add uniqueness, like isn't > >"HFile" a big win over "File"? there are already two other classes called > >"Cell" in hbase (guava and REST gateway). another option could be KV, > >though i don't like making exceptions to java's no-abbreviations > >guidelines. > KeyValueCell? > > To be honest, no name seems to be a very good option. However, it would be > nice if it would be somewhat related to KeyValue. > > On large scope, it would be hard to integrate this interface anytime soon. > I would rather do it later. > > >stack >> There is a patch lying around that adds a version to KV by using > >top > >two bytes of the type byte. If you need me to dig it up, just say > >(then you might not have to have v1 stuff in your Interface). > > > >not sure what you mean here. top two bits? you mean encoding the > >timestamp > >inside the type byte? > Versioning KeyValue per KeyValue seems to be crazy. Shouldn't it be per > block or file. > > > >(interface discussion) > > > It is a huge chance. It would be great if we could prototype a few things. > Especially I would like to avoid any optimizations before we know a got > way to measure them. > > Jacek > >
-
Re: prefix compression implementationJacek Migdal 2011-09-22, 00:23
On 9/20/11 6:04 PM, "Matt Corgan" <[EMAIL PROTECTED]> wrote: >jacek >> It is a huge chance. It would be great if we could prototype a >few >things. >Especially I would like to avoid any optimizations before we know a got >way to measure them. > >matt >> agree. i'm not in a rush to get any of this integrated, just >trying >to feel out the right long-term strategy. do you have unit tests that >you're running on a substantial amount of data to compare different >implementations? I got some tests on production data which test compression ratio. The performance test are synthetic and haven't measure real world performance. Right know I;m working on it. Jacek > >On Tue, Sep 20, 2011 at 4:58 PM, Jacek Migdal <[EMAIL PROTECTED]> wrote: > >> >> >> On 9/20/11 10:59 AM, "Matt Corgan" <[EMAIL PROTECTED]> wrote: >> >> >bringing all questions into a single email: >> > >> >stack >> I'd say call it Cell rather than HCell. >> > >> >i did think the H was a very simple way to add uniqueness, like isn't >> >"HFile" a big win over "File"? there are already two other classes >>called >> >"Cell" in hbase (guava and REST gateway). another option could be KV, >> >though i don't like making exceptions to java's no-abbreviations >> >guidelines. >> KeyValueCell? >> >> To be honest, no name seems to be a very good option. However, it would >>be >> nice if it would be somewhat related to KeyValue. >> >> On large scope, it would be hard to integrate this interface anytime >>soon. >> I would rather do it later. >> >> >stack >> There is a patch lying around that adds a version to KV by >>using >> >top >> >two bytes of the type byte. If you need me to dig it up, just say >> >(then you might not have to have v1 stuff in your Interface). >> > >> >not sure what you mean here. top two bits? you mean encoding the >> >timestamp >> >inside the type byte? >> Versioning KeyValue per KeyValue seems to be crazy. Shouldn't it be per >> block or file. >> >> >> >(interface discussion) >> > >> It is a huge chance. It would be great if we could prototype a few >>things. >> Especially I would like to avoid any optimizations before we know a got >> way to measure them. >> >> Jacek >> >> |