|
Jason Rutherglen
2011-07-08, 23:51
Ryan Rawson
2011-07-09, 00:34
Jason Rutherglen
2011-07-09, 01:19
Jason Rutherglen
2011-07-09, 01:20
Ryan Rawson
2011-07-09, 01:26
Li Pi
2011-07-09, 01:30
Jason Rutherglen
2011-07-09, 01:47
Jason Rutherglen
2011-07-09, 01:47
Ryan Rawson
2011-07-09, 02:05
Jason Rutherglen
2011-07-09, 02:18
Ryan Rawson
2011-07-09, 02:31
Jason Rutherglen
2011-07-09, 02:52
Li Pi
2011-07-09, 02:54
Ted Dunning
2011-07-09, 18:18
M. C. Srivas
2011-07-09, 19:25
Ryan Rawson
2011-07-09, 22:13
Jason Rutherglen
2011-07-09, 22:48
Doug Meil
2011-07-10, 01:04
Ryan Rawson
2011-07-10, 02:11
Ted Dunning
2011-07-10, 06:14
Jonathan Gray
2011-07-10, 07:59
Andrew Purtell
2011-07-10, 16:25
Jason Rutherglen
2011-07-10, 21:53
Jason Rutherglen
2011-07-10, 22:05
Jonathan Gray
2011-07-11, 19:18
Andrew Purtell
2011-07-11, 20:30
Jason Rutherglen
2011-07-12, 06:10
|
-
Converting byte[] to ByteBufferJason Rutherglen 2011-07-08, 23:51
Is there an open issue for this? How hard will this be? :)
-
Re: Converting byte[] to ByteBufferRyan Rawson 2011-07-09, 00:34
Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API
is...annoying. On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> wrote: > Is there an open issue for this? How hard will this be? :)
-
Re: Converting byte[] to ByteBufferJason Rutherglen 2011-07-09, 01:19
I don't think the object pointer overhead is very much given it's
usually pointing at a full block? Perhaps we can implement a nicer class like Lucene's BytesRef [1]. Then we can have our own class that may wrap a byte[] or ByteBuffer. 1. http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/util/BytesRef.html On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API > is...annoying. > On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> > wrote: >> Is there an open issue for this? How hard will this be? :) >
-
Re: Converting byte[] to ByteBufferJason Rutherglen 2011-07-09, 01:20
Also, it's for a good cause, moving the blocks out of main heap using
direct byte buffers or some other more native-like facility (if DBB's don't work). On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API > is...annoying. > On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> > wrote: >> Is there an open issue for this? How hard will this be? :) >
-
Re: Converting byte[] to ByteBufferRyan Rawson 2011-07-09, 01:26
The overhead in a byte buffer is the extra integers to keep track of the
mark, position, limit. I am not sure that putting the block cache in to heap is the way to go. Getting faster local dfs reads is important, and if you run hbase on top of Mapr, these things are taken care of for you. On Jul 8, 2011 6:20 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> wrote: > Also, it's for a good cause, moving the blocks out of main heap using > direct byte buffers or some other more native-like facility (if DBB's > don't work). > > On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API >> is...annoying. >> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> >> wrote: >>> Is there an open issue for this? How hard will this be? :) >>
-
Re: Converting byte[] to ByteBufferLi Pi 2011-07-09, 01:30
if you do that, you'll have to do a bit of reference counting. i'm working
on a slab allocated solution. On Fri, Jul 8, 2011 at 6:20 PM, Jason Rutherglen <[EMAIL PROTECTED] > wrote: > Also, it's for a good cause, moving the blocks out of main heap using > direct byte buffers or some other more native-like facility (if DBB's > don't work). > > On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > > Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API > > is...annoying. > > On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> > > wrote: > >> Is there an open issue for this? How hard will this be? :) > > >
-
Re: Converting byte[] to ByteBufferJason Rutherglen 2011-07-09, 01:47
There are couple of things here, one is direct byte buffers to put the
blocks outside of heap, the other is MMap'ing the blocks directly from the underlying HDFS file. I think they both make sense. And I'm not sure MapR's solution will be that much better if the latter is implemented in HBase. On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > The overhead in a byte buffer is the extra integers to keep track of the > mark, position, limit. > > I am not sure that putting the block cache in to heap is the way to go. > Getting faster local dfs reads is important, and if you run hbase on top of > Mapr, these things are taken care of for you. > On Jul 8, 2011 6:20 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> > wrote: >> Also, it's for a good cause, moving the blocks out of main heap using >> direct byte buffers or some other more native-like facility (if DBB's >> don't work). >> >> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API >>> is...annoying. >>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> >>> wrote: >>>> Is there an open issue for this? How hard will this be? :) >>> >
-
Re: Converting byte[] to ByteBufferJason Rutherglen 2011-07-09, 01:47
Reference counting is doable. Can you describe what the advantages
are of the slab allocated solution? On Fri, Jul 8, 2011 at 6:30 PM, Li Pi <[EMAIL PROTECTED]> wrote: > if you do that, you'll have to do a bit of reference counting. i'm working > on a slab allocated solution. > > On Fri, Jul 8, 2011 at 6:20 PM, Jason Rutherglen <[EMAIL PROTECTED] >> wrote: > >> Also, it's for a good cause, moving the blocks out of main heap using >> direct byte buffers or some other more native-like facility (if DBB's >> don't work). >> >> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >> > Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API >> > is...annoying. >> > On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> >> > wrote: >> >> Is there an open issue for this? How hard will this be? :) >> > >> >
-
Re: Converting byte[] to ByteBufferRyan Rawson 2011-07-09, 02:05
Hey,
When running on top of Mapr, hbase has fast cached access to locally stored files, the Mapr client ensures that. Likewise, hdfs should also ensure that local reads are fast and come out of cache as necessary. Eg: the kernel block cache. I wouldn't support mmap, it would require 2 different read path implementations. You will never know when a read is not local. Hdfs needs to provide faster local reads imo. Managing the block cache in not heap might work but you also might get there and find the dbb accounting overhead kills. On Jul 8, 2011 6:47 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> wrote: > There are couple of things here, one is direct byte buffers to put the > blocks outside of heap, the other is MMap'ing the blocks directly from > the underlying HDFS file. > > I think they both make sense. And I'm not sure MapR's solution will > be that much better if the latter is implemented in HBase. > > On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >> The overhead in a byte buffer is the extra integers to keep track of the >> mark, position, limit. >> >> I am not sure that putting the block cache in to heap is the way to go. >> Getting faster local dfs reads is important, and if you run hbase on top of >> Mapr, these things are taken care of for you. >> On Jul 8, 2011 6:20 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> >> wrote: >>> Also, it's for a good cause, moving the blocks out of main heap using >>> direct byte buffers or some other more native-like facility (if DBB's >>> don't work). >>> >>> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >>>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API >>>> is...annoying. >>>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> >>>> wrote: >>>>> Is there an open issue for this? How hard will this be? :) >>>> >>
-
Re: Converting byte[] to ByteBufferJason Rutherglen 2011-07-09, 02:18
> When running on top of Mapr, hbase has fast cached access to locally stored
> files, the Mapr client ensures that. Likewise, hdfs should also ensure that > local reads are fast and come out of cache as necessary. Eg: the kernel > block cache. Agreed! However I don't see how that's possible today. Eg, it'd require more of a byte buffer type of API to HDFS, random reads not using streams. It's easy to add. I think the biggest win for HBase with MapR is the lack of the NameNode issues and snapshotting. In particular, snapshots are pretty much a standard RDBMS feature. > Managing the block cache in not heap might work but you also might get there and find the dbb accounting > overhead kills. Lucene uses/abuses ref counting so I'm familiar with the downsides. When it works, it's great, when it doesn't it's a nightmare to debug. It is possible to make it work though. I don't think there would be overhead from it, ie, any pool of objects implements ref counting. It'd be nice to not have a block cache however it's necessary for caching compressed [on disk] blocks. On Fri, Jul 8, 2011 at 7:05 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > Hey, > > When running on top of Mapr, hbase has fast cached access to locally stored > files, the Mapr client ensures that. Likewise, hdfs should also ensure that > local reads are fast and come out of cache as necessary. Eg: the kernel > block cache. > > I wouldn't support mmap, it would require 2 different read path > implementations. You will never know when a read is not local. > > Hdfs needs to provide faster local reads imo. Managing the block cache in > not heap might work but you also might get there and find the dbb accounting > overhead kills. > On Jul 8, 2011 6:47 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> > wrote: >> There are couple of things here, one is direct byte buffers to put the >> blocks outside of heap, the other is MMap'ing the blocks directly from >> the underlying HDFS file. >> >> I think they both make sense. And I'm not sure MapR's solution will >> be that much better if the latter is implemented in HBase. >> >> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >>> The overhead in a byte buffer is the extra integers to keep track of the >>> mark, position, limit. >>> >>> I am not sure that putting the block cache in to heap is the way to go. >>> Getting faster local dfs reads is important, and if you run hbase on top > of >>> Mapr, these things are taken care of for you. >>> On Jul 8, 2011 6:20 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> >>> wrote: >>>> Also, it's for a good cause, moving the blocks out of main heap using >>>> direct byte buffers or some other more native-like facility (if DBB's >>>> don't work). >>>> >>>> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >>>>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API >>>>> is...annoying. >>>>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> >>>>> wrote: >>>>>> Is there an open issue for this? How hard will this be? :) >>>>> >>> >
-
Re: Converting byte[] to ByteBufferRyan Rawson 2011-07-09, 02:31
On Jul 8, 2011 7:19 PM, "Jason Rutherglen" <[EMAIL PROTECTED]>
wrote: > > > When running on top of Mapr, hbase has fast cached access to locally stored > > files, the Mapr client ensures that. Likewise, hdfs should also ensure that > > local reads are fast and come out of cache as necessary. Eg: the kernel > > block cache. > > Agreed! However I don't see how that's possible today. Eg, it'd > require more of a byte buffer type of API to HDFS, random reads not > using streams. It's easy to add. I don't think its as easy as you say. And even using the stream API Mapr delivers a lot more performance. And this is from my own tests not a white paper. > > I think the biggest win for HBase with MapR is the lack of the > NameNode issues and snapshotting. In particular, snapshots are pretty > much a standard RDBMS feature. That is good too - if you are using hbase in real time prod you need to look at Mapr. But even beyond that the performance improvements are insane. We are talking like 8-9x perf on my tests. Not to mention substantially reduced latency. I'll repeat again, local accelerated access is going to be a required feature. It already is. I investigated using dbb once upon a time, I concluded that managing the ref counts would be a nightmare, and the better solution was to copy keyvalues out of the dbb during scans. Injecting refcount code seems like a worse remedy than the problem. Hbase doesn't have as many bugs but explicit ref counting everywhere seems dangerous. Especially when a perf solution is already here. Use Mapr or hdfs-347/local reads. > > > Managing the block cache in not heap might work but you also might get there and find the dbb accounting > > overhead kills. > > Lucene uses/abuses ref counting so I'm familiar with the downsides. > When it works, it's great, when it doesn't it's a nightmare to debug. > It is possible to make it work though. I don't think there would be > overhead from it, ie, any pool of objects implements ref counting. > > It'd be nice to not have a block cache however it's necessary for > caching compressed [on disk] blocks. > > On Fri, Jul 8, 2011 at 7:05 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > > Hey, > > > > When running on top of Mapr, hbase has fast cached access to locally stored > > files, the Mapr client ensures that. Likewise, hdfs should also ensure that > > local reads are fast and come out of cache as necessary. Eg: the kernel > > block cache. > > > > I wouldn't support mmap, it would require 2 different read path > > implementations. You will never know when a read is not local. > > > > Hdfs needs to provide faster local reads imo. Managing the block cache in > > not heap might work but you also might get there and find the dbb accounting > > overhead kills. > > On Jul 8, 2011 6:47 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> > > wrote: > >> There are couple of things here, one is direct byte buffers to put the > >> blocks outside of heap, the other is MMap'ing the blocks directly from > >> the underlying HDFS file. > >> > >> I think they both make sense. And I'm not sure MapR's solution will > >> be that much better if the latter is implemented in HBase. > >> > >> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > >>> The overhead in a byte buffer is the extra integers to keep track of the > >>> mark, position, limit. > >>> > >>> I am not sure that putting the block cache in to heap is the way to go. > >>> Getting faster local dfs reads is important, and if you run hbase on top > > of > >>> Mapr, these things are taken care of for you. > >>> On Jul 8, 2011 6:20 PM, "Jason Rutherglen" <[EMAIL PROTECTED] > > >>> wrote: > >>>> Also, it's for a good cause, moving the blocks out of main heap using > >>>> direct byte buffers or some other more native-like facility (if DBB's > >>>> don't work). > >>>> > >>>> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > >>>>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API [EMAIL PROTECTED]>
-
Re: Converting byte[] to ByteBufferJason Rutherglen 2011-07-09, 02:52
> Especially when a perf solution is already here. Use Mapr or
> hdfs-347/local reads. Right. It goes back to avoiding GC and performing memory deallocation manually (like C). I think this makes sense given the number of issues people have with HBase and GC (more so than Lucene for example). MapR doesn't help with the GC issues. If MapR had a JNI interface into an external block cache then that'd be a different story. :) And I'm sure it's quite doable. > But even beyond that the performance improvements are insane. We are talking > like 8-9x perf on my tests. Not to mention substantially reduced latency. Was the comparison against HDFS-347? On Fri, Jul 8, 2011 at 7:31 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > On Jul 8, 2011 7:19 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> > wrote: >> >> > When running on top of Mapr, hbase has fast cached access to locally > stored >> > files, the Mapr client ensures that. Likewise, hdfs should also ensure > that >> > local reads are fast and come out of cache as necessary. Eg: the kernel >> > block cache. >> >> Agreed! However I don't see how that's possible today. Eg, it'd >> require more of a byte buffer type of API to HDFS, random reads not >> using streams. It's easy to add. > > I don't think its as easy as you say. And even using the stream API Mapr > delivers a lot more performance. And this is from my own tests not a white > paper. > >> >> I think the biggest win for HBase with MapR is the lack of the >> NameNode issues and snapshotting. In particular, snapshots are pretty >> much a standard RDBMS feature. > > That is good too - if you are using hbase in real time prod you need to look > at Mapr. > > But even beyond that the performance improvements are insane. We are talking > like 8-9x perf on my tests. Not to mention substantially reduced latency. > > I'll repeat again, local accelerated access is going to be a required > feature. It already is. > > I investigated using dbb once upon a time, I concluded that managing the ref > counts would be a nightmare, and the better solution was to copy keyvalues > out of the dbb during scans. > > Injecting refcount code seems like a worse remedy than the problem. Hbase > doesn't have as many bugs but explicit ref counting everywhere seems > dangerous. Especially when a perf solution is already here. Use Mapr or > hdfs-347/local reads. >> >> > Managing the block cache in not heap might work but you also might get > there and find the dbb accounting >> > overhead kills. >> >> Lucene uses/abuses ref counting so I'm familiar with the downsides. >> When it works, it's great, when it doesn't it's a nightmare to debug. >> It is possible to make it work though. I don't think there would be >> overhead from it, ie, any pool of objects implements ref counting. >> >> It'd be nice to not have a block cache however it's necessary for >> caching compressed [on disk] blocks. >> >> On Fri, Jul 8, 2011 at 7:05 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >> > Hey, >> > >> > When running on top of Mapr, hbase has fast cached access to locally > stored >> > files, the Mapr client ensures that. Likewise, hdfs should also ensure > that >> > local reads are fast and come out of cache as necessary. Eg: the kernel >> > block cache. >> > >> > I wouldn't support mmap, it would require 2 different read path >> > implementations. You will never know when a read is not local. >> > >> > Hdfs needs to provide faster local reads imo. Managing the block cache > in >> > not heap might work but you also might get there and find the dbb > accounting >> > overhead kills. >> > On Jul 8, 2011 6:47 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> >> > wrote: >> >> There are couple of things here, one is direct byte buffers to put the >> >> blocks outside of heap, the other is MMap'ing the blocks directly from >> >> the underlying HDFS file. >> >> >> >> I think they both make sense. And I'm not sure MapR's solution will >> >> be that much better if the latter is implemented in HBase.
-
Re: Converting byte[] to ByteBufferLi Pi 2011-07-09, 02:54
I have a slab allocated cache coded up, testing in YCSB right now :).
On Fri, Jul 8, 2011 at 7:52 PM, Jason Rutherglen <[EMAIL PROTECTED] > wrote: > > Especially when a perf solution is already here. Use Mapr or > > hdfs-347/local reads. > > Right. It goes back to avoiding GC and performing memory deallocation > manually (like C). I think this makes sense given the number of > issues people have with HBase and GC (more so than Lucene for > example). MapR doesn't help with the GC issues. If MapR had a JNI > interface into an external block cache then that'd be a different > story. :) And I'm sure it's quite doable. > > > But even beyond that the performance improvements are insane. We are > talking > > like 8-9x perf on my tests. Not to mention substantially reduced latency. > > Was the comparison against HDFS-347? > > On Fri, Jul 8, 2011 at 7:31 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > > On Jul 8, 2011 7:19 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> > > wrote: > >> > >> > When running on top of Mapr, hbase has fast cached access to locally > > stored > >> > files, the Mapr client ensures that. Likewise, hdfs should also ensure > > that > >> > local reads are fast and come out of cache as necessary. Eg: the > kernel > >> > block cache. > >> > >> Agreed! However I don't see how that's possible today. Eg, it'd > >> require more of a byte buffer type of API to HDFS, random reads not > >> using streams. It's easy to add. > > > > I don't think its as easy as you say. And even using the stream API Mapr > > delivers a lot more performance. And this is from my own tests not a > white > > paper. > > > >> > >> I think the biggest win for HBase with MapR is the lack of the > >> NameNode issues and snapshotting. In particular, snapshots are pretty > >> much a standard RDBMS feature. > > > > That is good too - if you are using hbase in real time prod you need to > look > > at Mapr. > > > > But even beyond that the performance improvements are insane. We are > talking > > like 8-9x perf on my tests. Not to mention substantially reduced latency. > > > > I'll repeat again, local accelerated access is going to be a required > > feature. It already is. > > > > I investigated using dbb once upon a time, I concluded that managing the > ref > > counts would be a nightmare, and the better solution was to copy > keyvalues > > out of the dbb during scans. > > > > Injecting refcount code seems like a worse remedy than the problem. Hbase > > doesn't have as many bugs but explicit ref counting everywhere seems > > dangerous. Especially when a perf solution is already here. Use Mapr or > > hdfs-347/local reads. > >> > >> > Managing the block cache in not heap might work but you also might get > > there and find the dbb accounting > >> > overhead kills. > >> > >> Lucene uses/abuses ref counting so I'm familiar with the downsides. > >> When it works, it's great, when it doesn't it's a nightmare to debug. > >> It is possible to make it work though. I don't think there would be > >> overhead from it, ie, any pool of objects implements ref counting. > >> > >> It'd be nice to not have a block cache however it's necessary for > >> caching compressed [on disk] blocks. > >> > >> On Fri, Jul 8, 2011 at 7:05 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > >> > Hey, > >> > > >> > When running on top of Mapr, hbase has fast cached access to locally > > stored > >> > files, the Mapr client ensures that. Likewise, hdfs should also ensure > > that > >> > local reads are fast and come out of cache as necessary. Eg: the > kernel > >> > block cache. > >> > > >> > I wouldn't support mmap, it would require 2 different read path > >> > implementations. You will never know when a read is not local. > >> > > >> > Hdfs needs to provide faster local reads imo. Managing the block cache > > in > >> > not heap might work but you also might get there and find the dbb > > accounting > >> > overhead kills. > >> > On Jul 8, 2011 6:47 PM, "Jason Rutherglen" <
-
Re: Converting byte[] to ByteBufferTed Dunning 2011-07-09, 18:18
MapR does help with the GC because it *does* have a JNI interface into an
external block cache. Typical configurations with MapR trim HBase down to the minimal viable size and increase the file system cache correspondingly. On Fri, Jul 8, 2011 at 7:52 PM, Jason Rutherglen <[EMAIL PROTECTED] > wrote: > MapR doesn't help with the GC issues. If MapR had a JNI > interface into an external block cache then that'd be a different > story. :) And I'm sure it's quite doable. >
-
Re: Converting byte[] to ByteBufferM. C. Srivas 2011-07-09, 19:25
On Fri, Jul 8, 2011 at 6:47 PM, Jason Rutherglen <[EMAIL PROTECTED]
> wrote: > There are couple of things here, one is direct byte buffers to put the > blocks outside of heap, the other is MMap'ing the blocks directly from > the underlying HDFS file. > I think they both make sense. And I'm not sure MapR's solution will > be that much better if the latter is implemented in HBase. > There're some major issues with mmap'ing the local hdfs file (the "block") directly: (a) no checksums to detect data corruption from bad disks (b) when a disk does fail, the dfs could start reading from an alternate replica ... but that option is lost when mmap'ing and the RS will crash immediately (c) security is completely lost, but that is minor given hbase's current status For those hbase deployments that don't care about the absence of the (a) and (b), especially (b), its definitely a viable option that gives good perf. At MapR, we did consider similar direct-access capability and rejected it due to the above concerns. > > On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > > The overhead in a byte buffer is the extra integers to keep track of the > > mark, position, limit. > > > > I am not sure that putting the block cache in to heap is the way to go. > > Getting faster local dfs reads is important, and if you run hbase on top > of > > Mapr, these things are taken care of for you. > > On Jul 8, 2011 6:20 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> > > wrote: > >> Also, it's for a good cause, moving the blocks out of main heap using > >> direct byte buffers or some other more native-like facility (if DBB's > >> don't work). > >> > >> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > >>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API > >>> is...annoying. > >>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" <[EMAIL PROTECTED] > > > >>> wrote: > >>>> Is there an open issue for this? How hard will this be? :) > >>> > > >
-
Re: Converting byte[] to ByteBufferRyan Rawson 2011-07-09, 22:13
I think my general point is we could hack up the hbase source, add
refcounting, circumvent the gc, etc or we could demand more from the dfs. If a variant of hdfs-347 was committed, reads could come from the Linux buffer cache and life would be good. The choice isn't fast hbase vs slow hbase, there are elements of bugs there as well. On Jul 9, 2011 12:25 PM, "M. C. Srivas" <[EMAIL PROTECTED]> wrote: > On Fri, Jul 8, 2011 at 6:47 PM, Jason Rutherglen < [EMAIL PROTECTED] >> wrote: > >> There are couple of things here, one is direct byte buffers to put the >> blocks outside of heap, the other is MMap'ing the blocks directly from >> the underlying HDFS file. > > >> I think they both make sense. And I'm not sure MapR's solution will >> be that much better if the latter is implemented in HBase. >> > > There're some major issues with mmap'ing the local hdfs file (the "block") > directly: > (a) no checksums to detect data corruption from bad disks > (b) when a disk does fail, the dfs could start reading from an alternate > replica ... but that option is lost when mmap'ing and the RS will crash > immediately > (c) security is completely lost, but that is minor given hbase's current > status > > For those hbase deployments that don't care about the absence of the (a) and > (b), especially (b), its definitely a viable option that gives good perf. > > At MapR, we did consider similar direct-access capability and rejected it > due to the above concerns. > > > >> >> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >> > The overhead in a byte buffer is the extra integers to keep track of the >> > mark, position, limit. >> > >> > I am not sure that putting the block cache in to heap is the way to go. >> > Getting faster local dfs reads is important, and if you run hbase on top >> of >> > Mapr, these things are taken care of for you. >> > On Jul 8, 2011 6:20 PM, "Jason Rutherglen" <[EMAIL PROTECTED]> >> > wrote: >> >> Also, it's for a good cause, moving the blocks out of main heap using >> >> direct byte buffers or some other more native-like facility (if DBB's >> >> don't work). >> >> >> >> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >> >>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the API >> >>> is...annoying. >> >>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" < [EMAIL PROTECTED] >> > >> >>> wrote: >> >>>> Is there an open issue for this? How hard will this be? :) >> >>> >> > >>
-
Re: Converting byte[] to ByteBufferJason Rutherglen 2011-07-09, 22:48
I'm a little confused, I was told none of the HBase code changed with MapR,
if the HBase (not the OS) block cache has a JNI implementation then that part of the HBase code changed. On Jul 9, 2011 11:19 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: > MapR does help with the GC because it *does* have a JNI interface into an > external block cache. > > Typical configurations with MapR trim HBase down to the minimal viable size > and increase the file system cache correspondingly. > > On Fri, Jul 8, 2011 at 7:52 PM, Jason Rutherglen < [EMAIL PROTECTED] >> wrote: > >> MapR doesn't help with the GC issues. If MapR had a JNI >> interface into an external block cache then that'd be a different >> story. :) And I'm sure it's quite doable. >>
-
Re: Converting byte[] to ByteBufferDoug Meil 2011-07-10, 01:04
re: "If a variant of hdfs-347 was committed," I agree with what Ryan is saying here, and I'd like to second (third? fourth?) keep pushing for HDFS improvements. Anything else is coding around the bigger I/O issue. On 7/9/11 6:13 PM, "Ryan Rawson" <[EMAIL PROTECTED]> wrote: >I think my general point is we could hack up the hbase source, add >refcounting, circumvent the gc, etc or we could demand more from the dfs. > >If a variant of hdfs-347 was committed, reads could come from the Linux >buffer cache and life would be good. > >The choice isn't fast hbase vs slow hbase, there are elements of bugs >there >as well. >On Jul 9, 2011 12:25 PM, "M. C. Srivas" <[EMAIL PROTECTED]> wrote: >> On Fri, Jul 8, 2011 at 6:47 PM, Jason Rutherglen < >[EMAIL PROTECTED] >>> wrote: >> >>> There are couple of things here, one is direct byte buffers to put the >>> blocks outside of heap, the other is MMap'ing the blocks directly from >>> the underlying HDFS file. >> >> >>> I think they both make sense. And I'm not sure MapR's solution will >>> be that much better if the latter is implemented in HBase. >>> >> >> There're some major issues with mmap'ing the local hdfs file (the >>"block") >> directly: >> (a) no checksums to detect data corruption from bad disks >> (b) when a disk does fail, the dfs could start reading from an alternate >> replica ... but that option is lost when mmap'ing and the RS will crash >> immediately >> (c) security is completely lost, but that is minor given hbase's current >> status >> >> For those hbase deployments that don't care about the absence of the (a) >and >> (b), especially (b), its definitely a viable option that gives good >>perf. >> >> At MapR, we did consider similar direct-access capability and rejected >>it >> due to the above concerns. >> >> >> >>> >>> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >>> > The overhead in a byte buffer is the extra integers to keep track of >the >>> > mark, position, limit. >>> > >>> > I am not sure that putting the block cache in to heap is the way to >>>go. >>> > Getting faster local dfs reads is important, and if you run hbase on >top >>> of >>> > Mapr, these things are taken care of for you. >>> > On Jul 8, 2011 6:20 PM, "Jason Rutherglen" >>><[EMAIL PROTECTED]> >>> > wrote: >>> >> Also, it's for a good cause, moving the blocks out of main heap >>>using >>> >> direct byte buffers or some other more native-like facility (if >>>DBB's >>> >> don't work). >>> >> >>> >> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <[EMAIL PROTECTED]> >wrote: >>> >>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the >>>API >>> >>> is...annoying. >>> >>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" < >[EMAIL PROTECTED] >>> > >>> >>> wrote: >>> >>>> Is there an open issue for this? How hard will this be? :) >>> >>> >>> > >>>
-
Re: Converting byte[] to ByteBufferRyan Rawson 2011-07-10, 02:11
No lines of hbase were changed to run on Mapr. Mapr implements the hdfs API
and uses jni to get local data. If hdfs wanted to it could use more sophisticated methods to get data rapidly from local disk to a client's memory space...as Mapr does. On Jul 9, 2011 6:05 PM, "Doug Meil" <[EMAIL PROTECTED]> wrote: > > re: "If a variant of hdfs-347 was committed," > > I agree with what Ryan is saying here, and I'd like to second (third? > fourth?) keep pushing for HDFS improvements. Anything else is coding > around the bigger I/O issue. > > > > On 7/9/11 6:13 PM, "Ryan Rawson" <[EMAIL PROTECTED]> wrote: > >>I think my general point is we could hack up the hbase source, add >>refcounting, circumvent the gc, etc or we could demand more from the dfs. >> >>If a variant of hdfs-347 was committed, reads could come from the Linux >>buffer cache and life would be good. >> >>The choice isn't fast hbase vs slow hbase, there are elements of bugs >>there >>as well. >>On Jul 9, 2011 12:25 PM, "M. C. Srivas" <[EMAIL PROTECTED]> wrote: >>> On Fri, Jul 8, 2011 at 6:47 PM, Jason Rutherglen < >>[EMAIL PROTECTED] >>>> wrote: >>> >>>> There are couple of things here, one is direct byte buffers to put the >>>> blocks outside of heap, the other is MMap'ing the blocks directly from >>>> the underlying HDFS file. >>> >>> >>>> I think they both make sense. And I'm not sure MapR's solution will >>>> be that much better if the latter is implemented in HBase. >>>> >>> >>> There're some major issues with mmap'ing the local hdfs file (the >>>"block") >>> directly: >>> (a) no checksums to detect data corruption from bad disks >>> (b) when a disk does fail, the dfs could start reading from an alternate >>> replica ... but that option is lost when mmap'ing and the RS will crash >>> immediately >>> (c) security is completely lost, but that is minor given hbase's current >>> status >>> >>> For those hbase deployments that don't care about the absence of the (a) >>and >>> (b), especially (b), its definitely a viable option that gives good >>>perf. >>> >>> At MapR, we did consider similar direct-access capability and rejected >>>it >>> due to the above concerns. >>> >>> >>> >>>> >>>> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >>>> > The overhead in a byte buffer is the extra integers to keep track of >>the >>>> > mark, position, limit. >>>> > >>>> > I am not sure that putting the block cache in to heap is the way to >>>>go. >>>> > Getting faster local dfs reads is important, and if you run hbase on >>top >>>> of >>>> > Mapr, these things are taken care of for you. >>>> > On Jul 8, 2011 6:20 PM, "Jason Rutherglen" >>>><[EMAIL PROTECTED]> >>>> > wrote: >>>> >> Also, it's for a good cause, moving the blocks out of main heap >>>>using >>>> >> direct byte buffers or some other more native-like facility (if >>>>DBB's >>>> >> don't work). >>>> >> >>>> >> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson <[EMAIL PROTECTED]> >>wrote: >>>> >>> Where? Everywhere? An array is 24 bytes, bb is 56 bytes. Also the >>>>API >>>> >>> is...annoying. >>>> >>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" < >>[EMAIL PROTECTED] >>>> > >>>> >>> wrote: >>>> >>>> Is there an open issue for this? How hard will this be? :) >>>> >>> >>>> > >>>> >
-
Re: Converting byte[] to ByteBufferTed Dunning 2011-07-10, 06:14
No. The JNI is below the HDFS compatible API. Thus the changed code is in
the hadoop.jar and associated jars and .so's that MapR supplies. The JNI still runs in the HBase memory image, though, so it can make data available faster. The cache involved includes the cache of disk blocks (not HBase memcache blocks) in the JNI and in the filer sub-system. The detailed reasons why more caching in the file system and less in HBase makes the overall system faster are not completely worked out, but the general outlines are pretty clear. There are likely several factors at work in any case including less GC cost due to smaller memory foot print, caching compressed blocks instead of Java structures and simplification due to a clean memory hand-off with associated strong demarcation of where different memory allocators have jurisdiction. On Sat, Jul 9, 2011 at 3:48 PM, Jason Rutherglen <[EMAIL PROTECTED] > wrote: > I'm a little confused, I was told none of the HBase code changed with MapR, > if the HBase (not the OS) block cache has a JNI implementation then that > part of the HBase code changed. > On Jul 9, 2011 11:19 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: > > MapR does help with the GC because it *does* have a JNI interface into an > > external block cache. > > > > Typical configurations with MapR trim HBase down to the minimal viable > size > > and increase the file system cache correspondingly. > > > > On Fri, Jul 8, 2011 at 7:52 PM, Jason Rutherglen < > [EMAIL PROTECTED] > >> wrote: > > > >> MapR doesn't help with the GC issues. If MapR had a JNI > >> interface into an external block cache then that'd be a different > >> story. :) And I'm sure it's quite doable. > >> >
-
RE: Converting byte[] to ByteBufferJonathan Gray 2011-07-10, 07:59
There are plenty of arguments in both directions for caching above the DB, in the DB, or under the DB/in the FS. I have significant interest in supporting large heaps and reducing GC issues within the HBase RegionServer and I am already running with local fs reads. I don't think a faster dfs makes HBase caching irrelevant or the conversation a non-starter.
To get back to the original question, I ended up trying this once. I wrote a rough implementation of a slab allocator a few months ago to dive in and see what it would take. The big challenge is KeyValue and its various comparators. The ByteBuffer API can be maddening at times but it can be done. I ended up somewhere slightly more generic, where KeyValue was taking a ByteBlock which contained ref counting and a reference to the allocator it came from, in addition to a ByteBuffer. The easy way to rely on DirectByteBuffers and the like would be to make a copy on read into a normal byte[], and then no need to worry about ref counting and revamping KV. Of course, at the cost of short-term allocations. In my experience, you can tune the GC around this and the cost really becomes CPU. I'm in the process of re-implementing some of this stuff on top of the HFile v2 that is coming soon. Once that goes in, this gets much easier at the HFile and block cache level (a new wrapper around ByteBuffer called HFileBlock which can be used for refc and such, instead of introducing huge changes for caching stuff) JG > -----Original Message----- > From: Ted Dunning [mailto:[EMAIL PROTECTED]] > Sent: Saturday, July 09, 2011 11:14 PM > To: [EMAIL PROTECTED] > Subject: Re: Converting byte[] to ByteBuffer > > No. The JNI is below the HDFS compatible API. Thus the changed code is in > the hadoop.jar and associated jars and .so's that MapR supplies. > > The JNI still runs in the HBase memory image, though, so it can make data > available faster. > > The cache involved includes the cache of disk blocks (not HBase memcache > blocks) in the JNI and in the filer sub-system. > > The detailed reasons why more caching in the file system and less in HBase > makes the overall system faster are not completely worked out, but the > general outlines are pretty clear. There are likely several factors at work in > any case including less GC cost due to smaller memory foot print, caching > compressed blocks instead of Java structures and simplification due to a > clean memory hand-off with associated strong demarcation of where > different memory allocators have jurisdiction. > > On Sat, Jul 9, 2011 at 3:48 PM, Jason Rutherglen > <[EMAIL PROTECTED] > > wrote: > > > I'm a little confused, I was told none of the HBase code changed with > > MapR, if the HBase (not the OS) block cache has a JNI implementation > > then that part of the HBase code changed. > > On Jul 9, 2011 11:19 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: > > > MapR does help with the GC because it *does* have a JNI interface > > > into an external block cache. > > > > > > Typical configurations with MapR trim HBase down to the minimal > > > viable > > size > > > and increase the file system cache correspondingly. > > > > > > On Fri, Jul 8, 2011 at 7:52 PM, Jason Rutherglen < > > [EMAIL PROTECTED] > > >> wrote: > > > > > >> MapR doesn't help with the GC issues. If MapR had a JNI interface > > >> into an external block cache then that'd be a different story. :) > > >> And I'm sure it's quite doable. > > >> > >
-
Re: Converting byte[] to ByteBufferAndrew Purtell 2011-07-10, 16:25
> I agree with what Ryan is saying here, and I'd like to second (third?
> fourth?) keep pushing for HDFS improvements. Anything else is coding > around the bigger I/O issue. The Facebook code drop, not the 0.20-append branch with its clean history but rather the hairball without (shame), has a HDFS patched with the same approach as Ryan's HDFS-347 but in addition it also checksums the blocks and caches NameNode metadata. I might swap out Ryan's HDFS-347 patch locally with an extraction of these changes. I've also been considering back porting the (stale) HADOOP-4801/HADOOP-6311 approach. Jason, it looks like you've recently updated those issues? Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) ---> From: Doug Meil <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Cc: > Sent: Saturday, July 9, 2011 6:04 PM > Subject: Re: Converting byte[] to ByteBuffer > > > re: "If a variant of hdfs-347 was committed," > > I agree with what Ryan is saying here, and I'd like to second (third? > fourth?) keep pushing for HDFS improvements. Anything else is coding > around the bigger I/O issue. > > > > On 7/9/11 6:13 PM, "Ryan Rawson" <[EMAIL PROTECTED]> wrote: > >> I think my general point is we could hack up the hbase source, add >> refcounting, circumvent the gc, etc or we could demand more from the dfs. >> >> If a variant of hdfs-347 was committed, reads could come from the Linux >> buffer cache and life would be good. >> >> The choice isn't fast hbase vs slow hbase, there are elements of bugs >> there >> as well. >> On Jul 9, 2011 12:25 PM, "M. C. Srivas" <[EMAIL PROTECTED]> > wrote: >>> On Fri, Jul 8, 2011 at 6:47 PM, Jason Rutherglen < >> [EMAIL PROTECTED] >>>> wrote: >>> >>>> There are couple of things here, one is direct byte buffers to put > the >>>> blocks outside of heap, the other is MMap'ing the blocks > directly from >>>> the underlying HDFS file. >>> >>> >>>> I think they both make sense. And I'm not sure MapR's > solution will >>>> be that much better if the latter is implemented in HBase. >>>> >>> >>> There're some major issues with mmap'ing the local hdfs file > (the >>> "block") >>> directly: >>> (a) no checksums to detect data corruption from bad disks >>> (b) when a disk does fail, the dfs could start reading from an > alternate >>> replica ... but that option is lost when mmap'ing and the RS will > crash >>> immediately >>> (c) security is completely lost, but that is minor given hbase's > current >>> status >>> >>> For those hbase deployments that don't care about the absence of > the (a) >> and >>> (b), especially (b), its definitely a viable option that gives good >>> perf. >>> >>> At MapR, we did consider similar direct-access capability and rejected >>> it >>> due to the above concerns. >>> >>> >>> >>>> >>>> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson > <[EMAIL PROTECTED]> wrote: >>>> > The overhead in a byte buffer is the extra integers to keep > track of >> the >>>> > mark, position, limit. >>>> > >>>> > I am not sure that putting the block cache in to heap is the > way to >>>> go. >>>> > Getting faster local dfs reads is important, and if you run > hbase on >> top >>>> of >>>> > Mapr, these things are taken care of for you. >>>> > On Jul 8, 2011 6:20 PM, "Jason Rutherglen" >>>> <[EMAIL PROTECTED]> >>>> > wrote: >>>> >> Also, it's for a good cause, moving the blocks out of > main heap >>>> using >>>> >> direct byte buffers or some other more native-like > facility (if >>>> DBB's >>>> >> don't work). >>>> >> >>>> >> On Fri, Jul 8, 2011 at 5:34 PM, Ryan Rawson > <[EMAIL PROTECTED]> >> wrote: >>>> >>> Where? Everywhere? An array is 24 bytes, bb is 56 > bytes. Also the >>>> API >>>> >>> is...annoying. >>>> >>> On Jul 8, 2011 4:51 PM, "Jason Rutherglen" > < >> [EMAIL PROTECTED] >>>> > >>>> >>> wrote: >>>
-
Re: Converting byte[] to ByteBufferJason Rutherglen 2011-07-10, 21:53
Andrew,
I fully agree. I opened HDFS-2004 to this end however it was (oddly) shot down. I think HBase usage of HDFS is divergent from the traditional MapReduce usage. MapR addresses these issues, as do some of the Facebook related work. I think HBase should work at a lower level than the traditional HDFS APIs, thus the only patches required for HDFS are ones that make it more malleable for the requirements of HBase. > Ryan's HDFS-347 but in addition it also checksums the blocks and caches NameNode metadata Sounds good, I'm interested in checking that out. On Sun, Jul 10, 2011 at 9:25 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >> I agree with what Ryan is saying here, and I'd like to second (third? >> fourth?) keep pushing for HDFS improvements. Anything else is coding >> around the bigger I/O issue. > > > The Facebook code drop, not the 0.20-append branch with its clean history but rather the hairball without (shame), has a HDFS patched with the same approach as Ryan's HDFS-347 but in addition it also checksums the blocks and caches NameNode metadata. I might swap out Ryan's HDFS-347 patch locally with an extraction of these changes. > > I've also been considering back porting the (stale) HADOOP-4801/HADOOP-6311 approach. Jason, it looks like you've recently updated those issues? > > Best regards, > > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) > > > ----- Original Message ----- >> From: Doug Meil <[EMAIL PROTECTED]> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >> Cc: >> Sent: Saturday, July 9, 2011 6:04 PM >> Subject: Re: Converting byte[] to ByteBuffer >> >> >> re: "If a variant of hdfs-347 was committed," >> >> I agree with what Ryan is saying here, and I'd like to second (third? >> fourth?) keep pushing for HDFS improvements. Anything else is coding >> around the bigger I/O issue. >> >> >> >> On 7/9/11 6:13 PM, "Ryan Rawson" <[EMAIL PROTECTED]> wrote: >> >>> I think my general point is we could hack up the hbase source, add >>> refcounting, circumvent the gc, etc or we could demand more from the dfs. >>> >>> If a variant of hdfs-347 was committed, reads could come from the Linux >>> buffer cache and life would be good. >>> >>> The choice isn't fast hbase vs slow hbase, there are elements of bugs >>> there >>> as well. >>> On Jul 9, 2011 12:25 PM, "M. C. Srivas" <[EMAIL PROTECTED]> >> wrote: >>>> On Fri, Jul 8, 2011 at 6:47 PM, Jason Rutherglen < >>> [EMAIL PROTECTED] >>>>> wrote: >>>> >>>>> There are couple of things here, one is direct byte buffers to put >> the >>>>> blocks outside of heap, the other is MMap'ing the blocks >> directly from >>>>> the underlying HDFS file. >>>> >>>> >>>>> I think they both make sense. And I'm not sure MapR's >> solution will >>>>> be that much better if the latter is implemented in HBase. >>>>> >>>> >>>> There're some major issues with mmap'ing the local hdfs file >> (the >>>> "block") >>>> directly: >>>> (a) no checksums to detect data corruption from bad disks >>>> (b) when a disk does fail, the dfs could start reading from an >> alternate >>>> replica ... but that option is lost when mmap'ing and the RS will >> crash >>>> immediately >>>> (c) security is completely lost, but that is minor given hbase's >> current >>>> status >>>> >>>> For those hbase deployments that don't care about the absence of >> the (a) >>> and >>>> (b), especially (b), its definitely a viable option that gives good >>>> perf. >>>> >>>> At MapR, we did consider similar direct-access capability and rejected >>>> it >>>> due to the above concerns. >>>> >>>> >>>> >>>>> >>>>> On Fri, Jul 8, 2011 at 6:26 PM, Ryan Rawson >> <[EMAIL PROTECTED]> wrote: >>>>> > The overhead in a byte buffer is the extra integers to keep >> track of >>> the >>>>> > mark, position, limit. >>>>> > >>>>> > I am not sure that putting the block cache in to heap is the >> way to >>>>> go. >>>>> > Getting faster local dfs reads is important, and if you run
-
Re: Converting byte[] to ByteBufferJason Rutherglen 2011-07-10, 22:05
Ted,
Interesting. I think we need to take a deeper look at why essentially turning off the caching of uncompressed blocks doesn't [seem to] matter. My guess is it's cheaper to decompress on the fly than hog from the system IO cache with JVM heap usage. Ie, CPU is cheaper than disk IO. Further, (I asked this previously), where is the general CPU usage in HBase? Binary search on keys for seeking, skip list reads and writes, and [maybe] MapReduce jobs? The rest should more or less be in the noise (or is general Java overhead). I'd be curious to know the avg CPU consumption of an active HBase system. On Sat, Jul 9, 2011 at 11:14 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > No. The JNI is below the HDFS compatible API. Thus the changed code is in > the hadoop.jar and associated jars and .so's that MapR supplies. > > The JNI still runs in the HBase memory image, though, so it can make data > available faster. > > The cache involved includes the cache of disk blocks (not HBase memcache > blocks) in the JNI and in the filer sub-system. > > The detailed reasons why more caching in the file system and less in HBase > makes the overall system faster are not completely worked out, but the > general outlines are pretty clear. There are likely several factors at work > in any case including less GC cost due to smaller memory foot print, caching > compressed blocks instead of Java structures and simplification due to a > clean memory hand-off with associated strong demarcation of where different > memory allocators have jurisdiction. > > On Sat, Jul 9, 2011 at 3:48 PM, Jason Rutherglen <[EMAIL PROTECTED] >> wrote: > >> I'm a little confused, I was told none of the HBase code changed with MapR, >> if the HBase (not the OS) block cache has a JNI implementation then that >> part of the HBase code changed. >> On Jul 9, 2011 11:19 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: >> > MapR does help with the GC because it *does* have a JNI interface into an >> > external block cache. >> > >> > Typical configurations with MapR trim HBase down to the minimal viable >> size >> > and increase the file system cache correspondingly. >> > >> > On Fri, Jul 8, 2011 at 7:52 PM, Jason Rutherglen < >> [EMAIL PROTECTED] >> >> wrote: >> > >> >> MapR doesn't help with the GC issues. If MapR had a JNI >> >> interface into an external block cache then that'd be a different >> >> story. :) And I'm sure it's quite doable. >> >> >> >
-
RE: Converting byte[] to ByteBufferJonathan Gray 2011-07-11, 19:18
In my experience, CPU usage on HBase is very high for highly concurrent applications. You can expect the CMS GC to chew up 2-3 cores at sufficient throughput and the remaining cores to be spent in CSLM/MemStore, KeyValue comparators, queues, etc.
> -----Original Message----- > From: Jason Rutherglen [mailto:[EMAIL PROTECTED]] > Sent: Sunday, July 10, 2011 3:05 PM > To: [EMAIL PROTECTED] > Subject: Re: Converting byte[] to ByteBuffer > > Ted, > > Interesting. I think we need to take a deeper look at why essentially turning > off the caching of uncompressed blocks doesn't [seem to] matter. My guess > is it's cheaper to decompress on the fly than hog from the system IO cache > with JVM heap usage. > > Ie, CPU is cheaper than disk IO. > > Further, (I asked this previously), where is the general CPU usage in HBase? > Binary search on keys for seeking, skip list reads and writes, and [maybe] > MapReduce jobs? The rest should more or less be in the noise (or is general > Java overhead). > > I'd be curious to know the avg CPU consumption of an active HBase system. > > On Sat, Jul 9, 2011 at 11:14 PM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > No. The JNI is below the HDFS compatible API. Thus the changed code > > is in the hadoop.jar and associated jars and .so's that MapR supplies. > > > > The JNI still runs in the HBase memory image, though, so it can make > > data available faster. > > > > The cache involved includes the cache of disk blocks (not HBase > > memcache > > blocks) in the JNI and in the filer sub-system. > > > > The detailed reasons why more caching in the file system and less in > > HBase makes the overall system faster are not completely worked out, > > but the general outlines are pretty clear. There are likely several > > factors at work in any case including less GC cost due to smaller > > memory foot print, caching compressed blocks instead of Java > > structures and simplification due to a clean memory hand-off with > > associated strong demarcation of where different memory allocators have > jurisdiction. > > > > On Sat, Jul 9, 2011 at 3:48 PM, Jason Rutherglen > > <[EMAIL PROTECTED] > >> wrote: > > > >> I'm a little confused, I was told none of the HBase code changed with > >> MapR, if the HBase (not the OS) block cache has a JNI implementation > >> then that part of the HBase code changed. > >> On Jul 9, 2011 11:19 AM, "Ted Dunning" <[EMAIL PROTECTED]> > wrote: > >> > MapR does help with the GC because it *does* have a JNI interface > >> > into an external block cache. > >> > > >> > Typical configurations with MapR trim HBase down to the minimal > >> > viable > >> size > >> > and increase the file system cache correspondingly. > >> > > >> > On Fri, Jul 8, 2011 at 7:52 PM, Jason Rutherglen < > >> [EMAIL PROTECTED] > >> >> wrote: > >> > > >> >> MapR doesn't help with the GC issues. If MapR had a JNI interface > >> >> into an external block cache then that'd be a different story. :) > >> >> And I'm sure it's quite doable. > >> >> > >> > >
-
Re: Converting byte[] to ByteBufferAndrew Purtell 2011-07-11, 20:30
> Further, (I asked this previously), where is the general CPU usage in
> HBase? Binary search on keys for seeking, skip list reads and writes, > and [maybe] MapReduce jobs? If you are running colocated MapReduce jobs, then it could be the user code of course. Otherwise it depends on workload. For our apps I observe the following top line items when profiling: - KV comparators: By far the most common operation, searching keys, writing HFiles, etc. - MemStore CSLM ops: Especially if upserting - Servicing RPCs: Writable marshall/unmarshall, monitors - Concurrent GC It generally looks good but MemStore can be improved, especially for the upsert case. Reminds me I need to profile the latest. It's been a few weeks. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) >________________________________ >From: Jason Rutherglen <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Sent: Sunday, July 10, 2011 3:05 PM >Subject: Re: Converting byte[] to ByteBuffer > >Ted, > >Interesting. I think we need to take a deeper look at why essentially >turning off the caching of uncompressed blocks doesn't [seem to] >matter. My guess is it's cheaper to decompress on the fly than hog >from the system IO cache with JVM heap usage. > >Ie, CPU is cheaper than disk IO. > >Further, (I asked this previously), where is the general CPU usage in >HBase? Binary search on keys for seeking, skip list reads and writes, >and [maybe] MapReduce jobs? The rest should more or less be in the >noise (or is general Java overhead). > >I'd be curious to know the avg CPU consumption of an active HBase system. > >On Sat, Jul 9, 2011 at 11:14 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: >> No. The JNI is below the HDFS compatible API. Thus the changed code is in >> the hadoop.jar and associated jars and .so's that MapR supplies. >> >> The JNI still runs in the HBase memory image, though, so it can make data >> available faster. >> >> The cache involved includes the cache of disk blocks (not HBase memcache >> blocks) in the JNI and in the filer sub-system. >> >> The detailed reasons why more caching in the file system and less in HBase >> makes the overall system faster are not completely worked out, but the >> general outlines are pretty clear. There are likely several factors at work >> in any case including less GC cost due to smaller memory foot print, caching >> compressed blocks instead of Java structures and simplification due to a >> clean memory hand-off with associated strong demarcation of where different >> memory allocators have jurisdiction. >> >> On Sat, Jul 9, 2011 at 3:48 PM, Jason Rutherglen <[EMAIL PROTECTED] >>> wrote: >> >>> I'm a little confused, I was told none of the HBase code changed with MapR, >>> if the HBase (not the OS) block cache has a JNI implementation then that >>> part of the HBase code changed. >>> On Jul 9, 2011 11:19 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: >>> > MapR does help with the GC because it *does* have a JNI interface into an >>> > external block cache. >>> > >>> > Typical configurations with MapR trim HBase down to the minimal viable >>> size >>> > and increase the file system cache correspondingly. >>> > >>> > On Fri, Jul 8, 2011 at 7:52 PM, Jason Rutherglen < >>> [EMAIL PROTECTED] >>> >> wrote: >>> > >>> >> MapR doesn't help with the GC issues. If MapR had a JNI >>> >> interface into an external block cache then that'd be a different >>> >> story. :) And I'm sure it's quite doable. >>> >> >>> >> > > >
-
Re: Converting byte[] to ByteBufferJason Rutherglen 2011-07-12, 06:10
> - MemStore CSLM ops: Especially if upserting
I quick thought on that one, perhaps it'd be helped by limiting the aggregate size of the CSLM, eg, skip lists at too large a size start to degrade in performance. Something like multiple CSLMs could work? Grow a CSLM to a given size, then start a new one. On Mon, Jul 11, 2011 at 1:30 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >> Further, (I asked this previously), where is the general CPU usage in >> HBase? Binary search on keys for seeking, skip list reads and writes, >> and [maybe] MapReduce jobs? > > If you are running colocated MapReduce jobs, then it could be the user code of course. > > Otherwise it depends on workload. > > For our apps I observe the following top line items when profiling: > > - KV comparators: By far the most common operation, searching keys, writing HFiles, etc. > > - MemStore CSLM ops: Especially if upserting > > - Servicing RPCs: Writable marshall/unmarshall, monitors > > - Concurrent GC > > It generally looks good but MemStore can be improved, especially for the upsert case. > > Reminds me I need to profile the latest. It's been a few weeks. > > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) > > >>________________________________ >>From: Jason Rutherglen <[EMAIL PROTECTED]> >>To: [EMAIL PROTECTED] >>Sent: Sunday, July 10, 2011 3:05 PM >>Subject: Re: Converting byte[] to ByteBuffer >> >>Ted, >> >>Interesting. I think we need to take a deeper look at why essentially >>turning off the caching of uncompressed blocks doesn't [seem to] >>matter. My guess is it's cheaper to decompress on the fly than hog >>from the system IO cache with JVM heap usage. >> >>Ie, CPU is cheaper than disk IO. >> >>Further, (I asked this previously), where is the general CPU usage in >>HBase? Binary search on keys for seeking, skip list reads and writes, >>and [maybe] MapReduce jobs? The rest should more or less be in the >>noise (or is general Java overhead). >> >>I'd be curious to know the avg CPU consumption of an active HBase system. >> >>On Sat, Jul 9, 2011 at 11:14 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: >>> No. The JNI is below the HDFS compatible API. Thus the changed code is in >>> the hadoop.jar and associated jars and .so's that MapR supplies. >>> >>> The JNI still runs in the HBase memory image, though, so it can make data >>> available faster. >>> >>> The cache involved includes the cache of disk blocks (not HBase memcache >>> blocks) in the JNI and in the filer sub-system. >>> >>> The detailed reasons why more caching in the file system and less in HBase >>> makes the overall system faster are not completely worked out, but the >>> general outlines are pretty clear. There are likely several factors at work >>> in any case including less GC cost due to smaller memory foot print, caching >>> compressed blocks instead of Java structures and simplification due to a >>> clean memory hand-off with associated strong demarcation of where different >>> memory allocators have jurisdiction. >>> >>> On Sat, Jul 9, 2011 at 3:48 PM, Jason Rutherglen <[EMAIL PROTECTED] >>>> wrote: >>> >>>> I'm a little confused, I was told none of the HBase code changed with MapR, >>>> if the HBase (not the OS) block cache has a JNI implementation then that >>>> part of the HBase code changed. >>>> On Jul 9, 2011 11:19 AM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: >>>> > MapR does help with the GC because it *does* have a JNI interface into an >>>> > external block cache. >>>> > >>>> > Typical configurations with MapR trim HBase down to the minimal viable >>>> size >>>> > and increase the file system cache correspondingly. >>>> > >>>> > On Fri, Jul 8, 2011 at 7:52 PM, Jason Rutherglen < >>>> [EMAIL PROTECTED] >>>> >> wrote: >>>> > >>>> >> MapR doesn't help with the GC issues. If MapR had a JNI >>>> >> interface into an external block cache then that'd be a different |