|
Eran Kutner
2011-03-28, 10:16
Stack
2011-03-28, 15:38
Ted Dunning
2011-03-28, 16:00
Eran Kutner
2011-03-29, 11:09
Jean-Daniel Cryans
2011-03-29, 17:54
Ted Dunning
2011-03-29, 18:38
Eran Kutner
2011-03-29, 19:30
Jean-Daniel Cryans
2011-03-29, 22:25
Eran Kutner
2011-03-31, 16:33
Jean-Daniel Cryans
2011-03-31, 17:27
Eran Kutner
2011-04-21, 12:13
Jean-Daniel Cryans
2011-04-21, 17:43
Eran Kutner
2011-04-26, 10:34
Stack
2011-04-26, 18:57
Josh
2011-04-26, 21:30
Eran Kutner
2011-04-27, 13:42
Eran Kutner
2011-04-27, 13:51
Eran Kutner
2011-04-27, 15:31
Jean-Daniel Cryans
2011-05-02, 19:14
Eran Kutner
2011-05-03, 13:20
Jean-Daniel Cryans
2011-05-03, 20:29
Eran Kutner
2011-05-04, 12:20
Eran Kutner
2011-05-09, 16:31
Stack
2011-05-09, 17:03
Eran Kutner
2011-05-09, 20:41
|
-
Performance test resultsEran Kutner 2011-03-28, 10:16
Hi,
I'm running some performance tests on a cluster with 5 member servers (not counting the masters of all kinds), each node running a data node, a region server and a thrift server. Each server has 2 quad core CPUs and 16GB of RAM. The data set I'm using is built of 50 sets of consecutive keys with 100 columns in each row all under a single CF. Each column has 1KB of random data. The entire data set is around 40GB so it should fit in RAM for caching purposes. I've enabled LZO for the table. I'm running the test through Thrift because that's the way it is going to be used in our production environment. I started with a basic insert operation. Inserting rows with one column with 1KB of data each. Initially, when the table was empty I was getting around 300 inserts per second with 50 writing threads. Then, when the region split and a second server was added the rate suddenly jumped to 3000 inserts/sec per server, so ~6000 for the two servers. Over time as more servers were added the rate actually went down, and stabilized on around 2000 inserts/sec per server. I'm not sure what is the reason for the jump on inserts per server after the region was split? Maybe local splits on the same RS that allowed more open files? I also conducted a random column read test, where I read different number of columns from randomly selected rows. First I tested reading only one specific column (the first in each row). It started at around 60r/s per server and gradually (I assume as more data was loaded into the cache) increased to ~800 r/s per server. When reading 5 random columns from each row the rate dropped to around 400 rows/sec and when fetching full rows (each with 100 columns) the rate remained about the same, at 400 rows/sec per server. I'm not sure exactly what should I expect but I was hoping for much higher numbers. I read somewhere that for small data it is reasonable to expect 10K inserts per core per second. I know 1KB isn't small but these are 8 core machines and they are doing about 2K inserts. Also the read rate is very low considering all the data should fit in RAM. The interesting thing is that there doesn't seem to be any resource bottleneck. IO utilization on the servers is negligible and CPU is around 40-50% utilization. The client generating the load is not loaded either (around 5% CPU utilization). Client network was at 30% utilization when reading full rows. So the only reason for flat-lining is some sort of lock contention. Does this make sense? Any way to improve it, especially the read performance? -eran
-
Re: Performance test resultsStack 2011-03-28, 15:38
On Mon, Mar 28, 2011 at 3:16 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> I started with a basic insert operation. Inserting rows with one > column with 1KB of data each. > Initially, when the table was empty I was getting around 300 inserts > per second with 50 writing threads. Then, when the region split and a > second server was added the rate suddenly jumped to 3000 inserts/sec > per server, so ~6000 for the two servers. Over time as more servers > were added the rate actually went down, and stabilized on around 2000 > inserts/sec per server. > What if you ran your client on more than one server? An insert is a single 1k cell? Tell us more about your configs. Are you using defaults? If you watch the logs during your upload, do you see much blocking? > I also conducted a random column read test, where I read different > number of columns from randomly selected rows. First I tested reading > only one specific column (the first in each row). It started at around > 60r/s per server and gradually (I assume as more data was loaded into > the cache) increased to ~800 r/s per server. You can check the regionserver log. It emits a cache stats log line every so often. Check cache hit rate percentage. > When reading 5 random > columns from each row the rate dropped to around 400 rows/sec and when > fetching full rows (each with 100 columns) the rate remained about the > same, at 400 rows/sec per server. > 100 columns in a row is 100k, right? > I'm not sure exactly what should I expect but I was hoping for much > higher numbers. I read somewhere that for small data it is reasonable > to expect 10K inserts per core per second. I know 1KB isn't small but > these are 8 core machines and they are doing about 2K inserts. Also > the read rate is very low considering all the data should fit in RAM. > The interesting thing is that there doesn't seem to be any resource > bottleneck. IO utilization on the servers is negligible and CPU is > around 40-50% utilization. The client generating the load is not > loaded either (around 5% CPU utilization). Client network was at 30% > utilization when reading full rows. So the only reason for flat-lining > is some sort of lock contention. Does this make sense? > This could be the case. If you jstack during the reads, what are you seeing? Are servers locked up waiting to pass a synchronization point or waiting on a lock? St.Ack
-
Re: Performance test resultsTed Dunning 2011-03-28, 16:00
This does sound pretty slow.
Using YCSB, I have seen insert rates of about 10,000 x 1kB records per second with two datanodes and one namenode using Hbase over HDFS. That isn't using thrift, though. On Mon, Mar 28, 2011 at 3:16 AM, Eran Kutner <[EMAIL PROTECTED]> wrote: > I started with a basic insert operation. Inserting rows with one > column with 1KB of data each. > Initially, when the table was empty I was getting around 300 inserts > per second with 50 writing threads. Then, when the region split and a > second server was added the rate suddenly jumped to 3000 inserts/sec > per server, so ~6000 for the two servers. Over time as more servers > were added the rate actually went down, and stabilized on around 2000 > inserts/sec per server. >
-
Re: Performance test resultsEran Kutner 2011-03-29, 11:09
Running the client on more than one server doesn't change the overall
results, the total number of requests just get distributed across the two clients. I tried two things, inserting rows with one column each and inserting rows with 100 columns each, in both cases the data was 1K per column, so it does add up to 100K per row for the second test. I guess my config is more or less standard, I have two masters and a 3 server ZK ensemble, I have replication enabled, but not for the table I'm using for testing, and the other tables are not getting any requests during this test. The only non standard thing I have is the new memory slab feature and the GC configuration as recommended in the recent Cloudera blog posts. I've attached the jstack dump from one of the RS, it seems a lot of threads are either parked or in "epollWait" state. Thanks for looking into it. -eran On Mon, Mar 28, 2011 at 17:38, Stack <[EMAIL PROTECTED]> wrote: > > On Mon, Mar 28, 2011 at 3:16 AM, Eran Kutner <[EMAIL PROTECTED]> wrote: > > I started with a basic insert operation. Inserting rows with one > > column with 1KB of data each. > > Initially, when the table was empty I was getting around 300 inserts > > per second with 50 writing threads. Then, when the region split and a > > second server was added the rate suddenly jumped to 3000 inserts/sec > > per server, so ~6000 for the two servers. Over time as more servers > > were added the rate actually went down, and stabilized on around 2000 > > inserts/sec per server. > > > > What if you ran your client on more than one server? > > An insert is a single 1k cell? > > Tell us more about your configs. Are you using defaults? If you > watch the logs during your upload, do you see much blocking? > > > I also conducted a random column read test, where I read different > > number of columns from randomly selected rows. First I tested reading > > only one specific column (the first in each row). It started at around > > 60r/s per server and gradually (I assume as more data was loaded into > > the cache) increased to ~800 r/s per server. > > You can check the regionserver log. It emits a cache stats log line > every so often. Check cache hit rate percentage. > > > When reading 5 random > > columns from each row the rate dropped to around 400 rows/sec and when > > fetching full rows (each with 100 columns) the rate remained about the > > same, at 400 rows/sec per server. > > > > 100 columns in a row is 100k, right? > > > I'm not sure exactly what should I expect but I was hoping for much > > higher numbers. I read somewhere that for small data it is reasonable > > to expect 10K inserts per core per second. I know 1KB isn't small but > > these are 8 core machines and they are doing about 2K inserts. Also > > the read rate is very low considering all the data should fit in RAM. > > The interesting thing is that there doesn't seem to be any resource > > bottleneck. IO utilization on the servers is negligible and CPU is > > around 40-50% utilization. The client generating the load is not > > loaded either (around 5% CPU utilization). Client network was at 30% > > utilization when reading full rows. So the only reason for flat-lining > > is some sort of lock contention. Does this make sense? > > > > This could be the case. If you jstack during the reads, what are you > seeing? Are servers locked up waiting to pass a synchronization point > or waiting on a lock? > > St.Ack
-
Re: Performance test resultsJean-Daniel Cryans 2011-03-29, 17:54
Hey Eran,
Usually this mailing list doesn't accept attachements (or it works for voodoo reasons) so you'd be better off pastebin'ing them. Some thoughts: - Inserting into a new table without pre-splitting it is bound to be a red herring of bad performance. Please pre-split it with methods such as http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor, byte[][]) - You have 1 thrift per slave, but how are you using them? Pushing everything to just one? Or each time to do a put you take a random server? - What you described when all region servers and clients are using low resources is often a sign that you are waiting on the network round trips a lot. Are you pushing only one row at a time or a batch of them? Getting high insertions rates is usually done with batching since a single RPC has to first go to the Thrift server, then to the region server, and back all the way. - Which language are you using to talk to thrift? - When you added a second client, which key space was it using? Trying to write to the same regions? Or did you start with an empty region again? Thx, J-D > Running the client on more than one server doesn't change the overall > results, the total number of requests just get distributed across the > two clients. > I tried two things, inserting rows with one column each and inserting > rows with 100 columns each, in both cases the data was 1K per column, > so it does add up to 100K per row for the second test. > I guess my config is more or less standard, I have two masters and a 3 > server ZK ensemble, I have replication enabled, but not for the table > I'm using for testing, and the other tables are not getting any > requests during this test. The only non standard thing I have is the > new memory slab feature and the GC configuration as recommended in the > recent Cloudera blog posts. > I've attached the jstack dump from one of the RS, it seems a lot of > threads are either parked or in "epollWait" state. > > Thanks for looking into it. > > -eran
-
Re: Performance test resultsTed Dunning 2011-03-29, 18:38
Watch out when pre-splitting. Your key distribution may not be as uniform
as you might think. This particularly happens when keys are represented in some printable form. Base 64, for instance only populates a small fraction of the base 256 key space. On Tue, Mar 29, 2011 at 10:54 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote: > - Inserting into a new table without pre-splitting it is bound to be a > red herring of bad performance. Please pre-split it with methods such > as > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor > , > byte[][]) >
-
Re: Performance test resultsEran Kutner 2011-03-29, 19:30
Hi J-D,
I can't paste the entire file because it's 126K. Trying to attach it now as zip, lets see if that has more luck. As far as I can tell most of the threads are blocked either like this: "RMI TCP Connection(idle)" daemon prio=10 tid=0x00002aaad011d000 nid=0x269c waiting on condition [0x0000000041e4d000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000045c687200> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:424) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:323) at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:874) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:945) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - None or like this: "ResponseProcessor for block blk_2435887137905447383_11770" daemon prio=10 tid=0x000000004f08e000 nid=0x2cb9 runnable [0x0000000049097000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0x000000045dbe20b0> (a sun.nio.ch.Util$2) - locked <0x000000045dbe2098> (a java.util.Collections$UnmodifiableSet) - locked <0x00000004fa1a2510> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:120) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2622) Locked ownable synchronizers: - None I didn't pre-split and I guess that explains the behavior I saw in which the write performance started at 300 inserts/sec and then increased up to 3000 per server when the region was split and spread to two servers. It doesn't explain why the rate actually dropped after more splits and more servers were added to the table, until eventually it stabilized on around 2000 inserts/sec per server. I have 1 thrift server per slave. I'm using C# to access the thirft servers. My C# library manages its own connection pool, it does round-robin between the servers and re-uses open connections, so not every call will open a new connection. After a few seconds of running the test all the connections are re-used and no new connections are being opened. I'm inserting the rows one by one because that represent the kind of OLTP load that I have in mind for this system. Batching multiple rows, I believe, is more suitable for analytical processing. The second client was using the same key space, but I tried the single client with a few thread configurations, from 1 to 100, where each thread was using a different key space, I didn't really see any difference between 50 threads and 100 threads, so I don't think it's a key space distribution issue. I agree that network latency can be causing the problem but then I would expect to see more overall reads/writes as the client thread count increases, as I said above 40-50 thread there was no improvement. -eran On Tue, Mar 29, 2011 at 19:54, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote:
-
Re: Performance test resultsJean-Daniel Cryans 2011-03-29, 22:25
Inline.
J-D > Hi J-D, > I can't paste the entire file because it's 126K. Trying to attach it > now as zip, lets see if that has more luck. In the jstack you posted, all the Gets were hitting HDFS which is probably why it's slow. Until you can get something like HDFS-347 in your Hadoop you'll have to make sure you can block cache most of what what you're going to read. You can tune the size of the block cache since by default it's only 20% of the whole heap. > > I didn't pre-split and I guess that explains the behavior I saw in > which the write performance started at 300 inserts/sec and then > increased up to 3000 per server when the region was split and spread > to two servers. It doesn't explain why the rate actually dropped after > more splits and more servers were added to the table, until eventually > it stabilized on around 2000 inserts/sec per server. Yeah that doesn't explain it, but for that part of the loading we basically have 0 information about the regions' layout on the cluster and how the regions were used. 3k might just be a spike that didn't last super long and for all I know it should not be cared about. Was the 2k/sec done by just one machine or they were all participating equally? How many regions did you end up with at the end? > > I have 1 thrift server per slave. I'm using C# to access the thirft > servers. My C# library manages its own connection pool, it does > round-robin between the servers and re-uses open connections, so not > every call will open a new connection. After a few seconds of running > the test all the connections are re-used and no new connections are > being opened. Sounds good. > > I'm inserting the rows one by one because that represent the kind of > OLTP load that I have in mind for this system. Batching multiple rows, > I believe, is more suitable for analytical processing. Makes sense. > > The second client was using the same key space, but I tried the single > client with a few thread configurations, from 1 to 100, where each > thread was using a different key space, I didn't really see any > difference between 50 threads and 100 threads, so I don't think it's a > key space distribution issue. That part doesn't make sense at all, there must be something you're not seeing that would explain that. Like number of regions and their layout. Also maybe your assumptions about the key spaces are wrong (by experience I always assume the user is wrong, sorry). > > I agree that network latency can be causing the problem but then I > would expect to see more overall reads/writes as the client thread > count increases, as I said above 40-50 thread there was no > improvement. Indeed, something is off and we're not seeing it.
-
Re: Performance test resultsEran Kutner 2011-03-31, 16:33
I assume the block cache tunning key you talk about is
"hfile.block.cache.size", right? If it is only 20% by default than what is the rest of the heap used for? Since there are no fancy operations like joins and since I'm not using memory tables the only thing I can think of is the memstore right? What is the recommended value for the block cache? As for the regions layout, right now the table in discussion has 264 regions more or less evenly distributed among the 5 region servers. Let me know what other information I can provide. The key space is as follows: I launch n threads, each thread writes keys that look like "streami_c" where "i" is the thread index (1-n) and "c" is a counter that goes up from 1 until I stop the test. I understand that each thread is only writing to the tail of its own keyspace so only "n" region files can be used, however if that was the limitation then adding more threads each with its own keyspace should have increased the throughput. -eran On Wed, Mar 30, 2011 at 00:25, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: > > Inline. > > J-D > > > Hi J-D, > > I can't paste the entire file because it's 126K. Trying to attach it > > now as zip, lets see if that has more luck. > > In the jstack you posted, all the Gets were hitting HDFS which is > probably why it's slow. Until you can get something like HDFS-347 in > your Hadoop you'll have to make sure you can block cache most of what > what you're going to read. You can tune the size of the block cache > since by default it's only 20% of the whole heap. > > > > > I didn't pre-split and I guess that explains the behavior I saw in > > which the write performance started at 300 inserts/sec and then > > increased up to 3000 per server when the region was split and spread > > to two servers. It doesn't explain why the rate actually dropped after > > more splits and more servers were added to the table, until eventually > > it stabilized on around 2000 inserts/sec per server. > > Yeah that doesn't explain it, but for that part of the loading we > basically have 0 information about the regions' layout on the cluster > and how the regions were used. 3k might just be a spike that didn't > last super long and for all I know it should not be cared about. Was > the 2k/sec done by just one machine or they were all participating > equally? How many regions did you end up with at the end? > > > > > I have 1 thrift server per slave. I'm using C# to access the thirft > > servers. My C# library manages its own connection pool, it does > > round-robin between the servers and re-uses open connections, so not > > every call will open a new connection. After a few seconds of running > > the test all the connections are re-used and no new connections are > > being opened. > > Sounds good. > > > > > I'm inserting the rows one by one because that represent the kind of > > OLTP load that I have in mind for this system. Batching multiple rows, > > I believe, is more suitable for analytical processing. > > Makes sense. > > > > > The second client was using the same key space, but I tried the single > > client with a few thread configurations, from 1 to 100, where each > > thread was using a different key space, I didn't really see any > > difference between 50 threads and 100 threads, so I don't think it's a > > key space distribution issue. > > That part doesn't make sense at all, there must be something you're > not seeing that would explain that. Like number of regions and their > layout. Also maybe your assumptions about the key spaces are wrong (by > experience I always assume the user is wrong, sorry). > > > > > I agree that network latency can be causing the problem but then I > > would expect to see more overall reads/writes as the client thread > > count increases, as I said above 40-50 thread there was no > > improvement. > > Indeed, something is off and we're not seeing it.
-
Re: Performance test resultsJean-Daniel Cryans 2011-03-31, 17:27
Inline.
J-D > I assume the block cache tunning key you talk about is > "hfile.block.cache.size", right? If it is only 20% by default than > what is the rest of the heap used for? Since there are no fancy > operations like joins and since I'm not using memory tables the only > thing I can think of is the memstore right? What is the recommended > value for the block cache? By default a max of 40% of the heap is reserved to MemStores, the rest is used to answer queries, do compactions, flushes, etc. It's very conservative, but people still find ways to OOME with very big cells sometimes :) > > As for the regions layout, right now the table in discussion has 264 > regions more or less evenly distributed among the 5 region servers. > Let me know what other information I can provide. That's fine, but more important is the layout during the test. It can be tricky to benchmark a "real life workload" if you just did them import because it takes some time for the dust to settle. One example among many others, the balancer only runs only every few minutes so if you're doing a massive insert and then read, the load might only be on two machines. > > The key space is as follows: I launch n threads, each thread writes > keys that look like "streami_c" where "i" is the thread index (1-n) > and "c" is a counter that goes up from 1 until I stop the test. I > understand that each thread is only writing to the tail of its own > keyspace so only "n" region files can be used, however if that was the > limitation then adding more threads each with its own keyspace should > have increased the throughput. And can you tell by the start/stop keys that those threads do hit different regions. I understand you wouldn't have to worry about that too much in a real life scenario but since yours is artificial then who knows how it ended up. In order to speedup this discussion, feel free to drop by our IRC channel on freenode, very often we're able to find issues much faster using less time for everyone (and then report the findings here). J-D
-
Re: Performance test resultsEran Kutner 2011-04-21, 12:13
Hi J-D,
After stabilizing the configuration, with your great help, I was able to go back to the the load tests. I tried using IRC, as you suggested, to continue this discussion but because of the time difference (I'm GMT+3) it is quite difficult to find a time when people are present and I am available to run long tests, so I'll give the mailing list one more try. I tested again on a clean table using 100 insert threads each, using a separate keyspace within the test table. Every row had just one column with 128 bytes of data. With one server and one region I got about 2300 inserts per second. After manually splitting the region I got about 3600 inserts per second (still on one machine). After a while the regions were balanced and one was moved to another server, that got writes to around 4500 writes per second. Additional splits and moves to more servers didn't improve this number and the write performance stabilized at ~4000 writes/sec per server. This seems pretty low, especially considering other numbers I've seen around here. Read performance is at around 1500 rows per second per server, which seems extremely low to me, especially considering that all the working set I was querying could fit in the servers memory. To make the test interesting I limited my client to fetch only 1 row (always the same one) from each keyspace, that yielded 10K reads per sec per server, so I tried increasing the range again a read the same 10 rows, now the performance dropped to 8500 reads/sec per server. Increasing the range to 100 rows and the performance drops to around 3500 reads per second per server. Do you have any idea what could explain this behavior and how do I get a decent number of reads from those servers? -eran On Thu, Mar 31, 2011 at 20:27, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: > > Inline. > > J-D > > > I assume the block cache tunning key you talk about is > > "hfile.block.cache.size", right? If it is only 20% by default than > > what is the rest of the heap used for? Since there are no fancy > > operations like joins and since I'm not using memory tables the only > > thing I can think of is the memstore right? What is the recommended > > value for the block cache? > > By default a max of 40% of the heap is reserved to MemStores, the rest > is used to answer queries, do compactions, flushes, etc. It's very > conservative, but people still find ways to OOME with very big cells > sometimes :) > > > > > As for the regions layout, right now the table in discussion has 264 > > regions more or less evenly distributed among the 5 region servers. > > Let me know what other information I can provide. > > That's fine, but more important is the layout during the test. It can > be tricky to benchmark a "real life workload" if you just did them > import because it takes some time for the dust to settle. One example > among many others, the balancer only runs only every few minutes so if > you're doing a massive insert and then read, the load might only be on > two machines. > > > > > The key space is as follows: I launch n threads, each thread writes > > keys that look like "streami_c" where "i" is the thread index (1-n) > > and "c" is a counter that goes up from 1 until I stop the test. I > > understand that each thread is only writing to the tail of its own > > keyspace so only "n" region files can be used, however if that was the > > limitation then adding more threads each with its own keyspace should > > have increased the throughput. > > And can you tell by the start/stop keys that those threads do hit > different regions. I understand you wouldn't have to worry about that > too much in a real life scenario but since yours is artificial then > who knows how it ended up. > > In order to speedup this discussion, feel free to drop by our IRC > channel on freenode, very often we're able to find issues much faster > using less time for everyone (and then report the findings here). > > J-D
-
Re: Performance test resultsJean-Daniel Cryans 2011-04-21, 17:43
Hey Eran,
Glad you could go back to debugging performance :) The scalability issues you are seeing are unknown to me, it sounds like the client isn't pushing it enough. It reminded me of when we switched to using the native Thrift PHP extension instead of the "normal" one and we saw huge speedups. My limited knowledge of Thrift may be blinding me, but I looked around for C# Thrift performance issues and found threads like this one http://www.mail-archive.com/[EMAIL PROTECTED]/msg00320.html As you didn't really debug the speed of Thrift itself in your setup, this is one more variable in the problem. Also you don't really provide metrics about your system apart from requests/second. Would it be possible for you set them up using this guide? http://hbase.apache.org/metrics.html J-D On Thu, Apr 21, 2011 at 5:13 AM, Eran Kutner <eran@> wrote: > Hi J-D, > After stabilizing the configuration, with your great help, I was able > to go back to the the load tests. I tried using IRC, as you suggested, > to continue this discussion but because of the time difference (I'm > GMT+3) it is quite difficult to find a time when people are present > and I am available to run long tests, so I'll give the mailing list > one more try. > > I tested again on a clean table using 100 insert threads each, using a > separate keyspace within the test table. Every row had just one column > with 128 bytes of data. > With one server and one region I got about 2300 inserts per second. > After manually splitting the region I got about 3600 inserts per > second (still on one machine). After a while the regions were balanced > and one was moved to another server, that got writes to around 4500 > writes per second. Additional splits and moves to more servers didn't > improve this number and the write performance stabilized at ~4000 > writes/sec per server. This seems pretty low, especially considering > other numbers I've seen around here. > > Read performance is at around 1500 rows per second per server, which > seems extremely low to me, especially considering that all the working > set I was querying could fit in the servers memory. To make the test > interesting I limited my client to fetch only 1 row (always the same > one) from each keyspace, that yielded 10K reads per sec per server, so > I tried increasing the range again a read the same 10 rows, now the > performance dropped to 8500 reads/sec per server. Increasing the range > to 100 rows and the performance drops to around 3500 reads per second > per server. > Do you have any idea what could explain this behavior and how do I get > a decent number of reads from those servers? > > -eran
-
Re: Performance test resultsEran Kutner 2011-04-26, 10:34
Hi J-D,
I don't think it's a Thrift issue. First, I use the TBufferedTransport transport, second, I implemented my own connection pool so the same connections are reused over and over again, so there is no overhead for opening and closing connections (I've verified that using Wireshark), third, if it was a client capacity issue I would expect to see an increase in throughput as I add more threads or run the test on two servers in parallel, this doesn't seem to happen, the total capacity remains unchanged. As for metrics, I already have it configured and monitored using Zabbix, but it only monitors specific counters, so let me know what information you would like to see. The numbers I quoted before are based on client counters and correlated with server counters ("multi" for writes and "get" for reads). -eran On Thu, Apr 21, 2011 at 20:43, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: > > Hey Eran, > > Glad you could go back to debugging performance :) > > The scalability issues you are seeing are unknown to me, it sounds > like the client isn't pushing it enough. It reminded me of when we > switched to using the native Thrift PHP extension instead of the > "normal" one and we saw huge speedups. My limited knowledge of Thrift > may be blinding me, but I looked around for C# Thrift performance > issues and found threads like this one > http://www.mail-archive.com/[EMAIL PROTECTED]/msg00320.html > > As you didn't really debug the speed of Thrift itself in your setup, > this is one more variable in the problem. > > Also you don't really provide metrics about your system apart from > requests/second. Would it be possible for you set them up using this > guide? http://hbase.apache.org/metrics.html > > J-D > > On Thu, Apr 21, 2011 at 5:13 AM, Eran Kutner <eran@> wrote: > > Hi J-D, > > After stabilizing the configuration, with your great help, I was able > > to go back to the the load tests. I tried using IRC, as you suggested, > > to continue this discussion but because of the time difference (I'm > > GMT+3) it is quite difficult to find a time when people are present > > and I am available to run long tests, so I'll give the mailing list > > one more try. > > > > I tested again on a clean table using 100 insert threads each, using a > > separate keyspace within the test table. Every row had just one column > > with 128 bytes of data. > > With one server and one region I got about 2300 inserts per second. > > After manually splitting the region I got about 3600 inserts per > > second (still on one machine). After a while the regions were balanced > > and one was moved to another server, that got writes to around 4500 > > writes per second. Additional splits and moves to more servers didn't > > improve this number and the write performance stabilized at ~4000 > > writes/sec per server. This seems pretty low, especially considering > > other numbers I've seen around here. > > > > Read performance is at around 1500 rows per second per server, which > > seems extremely low to me, especially considering that all the working > > set I was querying could fit in the servers memory. To make the test > > interesting I limited my client to fetch only 1 row (always the same > > one) from each keyspace, that yielded 10K reads per sec per server, so > > I tried increasing the range again a read the same 10 rows, now the > > performance dropped to 8500 reads/sec per server. Increasing the range > > to 100 rows and the performance drops to around 3500 reads per second > > per server. > > Do you have any idea what could explain this behavior and how do I get > > a decent number of reads from those servers? > > > > -eran
-
Re: Performance test resultsStack 2011-04-26, 18:57
On Thu, Apr 21, 2011 at 5:13 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> I tested again on a clean table using 100 insert threads each, using a > separate keyspace within the test table. Every row had just one column > with 128 bytes of data. > > With one server and one region I got about 2300 inserts per second. > After manually splitting the region I got about 3600 inserts per > second (still on one machine). After a while the regions were balanced > and one was moved to another server, that got writes to around 4500 > writes per second. Additional splits and moves to more servers didn't > improve this number and the write performance stabilized at ~4000 > writes/sec per server. This seems pretty low, especially considering > other numbers I've seen around here. > If you run your insert process on more than one box, do the numbers change? Nothing in http://hbase.apache.org/book.html#performance helps? What size your keys? > Read performance is at around 1500 rows per second per server, which > seems extremely low to me, especially considering that all the working > set I was querying could fit in the servers memory. To make the test > interesting I limited my client to fetch only 1 row (always the same > one) from each keyspace, that yielded 10K reads per sec per server, so > I tried increasing the range again a read the same 10 rows, now the > performance dropped to 8500 reads/sec per server. Increasing the range > to 100 rows and the performance drops to around 3500 reads per second > per server. This result is interesting. The cache logs hit rate in the regionserver logs. You seeing near 100% for 1 row, 10 row, and 100 rows? St.Ack
-
Re: Performance test resultsJosh 2011-04-26, 21:30
On Tue, Apr 26, 2011 at 3:34 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> Hi J-D, > I don't think it's a Thrift issue. First, I use the TBufferedTransport > transport, second, I implemented my own connection pool so the same > connections are reused over and over again, Hey! I'm using C#->Hbase and high on my list of things todo is 'Implement Thrift Connection Pooling in C#'. You have any desire to release that code? > so there is no overhead > for opening and closing connections (I've verified that using > Wireshark), third, if it was a client capacity issue I would expect to > see an increase in throughput as I add more threads or run the test on > two servers in parallel, this doesn't seem to happen, the total > capacity remains unchanged. > > As for metrics, I already have it configured and monitored using > Zabbix, but it only monitors specific counters, so let me know what > information you would like to see. The numbers I quoted before are > based on client counters and correlated with server counters ("multi" > for writes and "get" for reads). > > -eran > > > > On Thu, Apr 21, 2011 at 20:43, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: >> >> Hey Eran, >> >> Glad you could go back to debugging performance :) >> >> The scalability issues you are seeing are unknown to me, it sounds >> like the client isn't pushing it enough. It reminded me of when we >> switched to using the native Thrift PHP extension instead of the >> "normal" one and we saw huge speedups. My limited knowledge of Thrift >> may be blinding me, but I looked around for C# Thrift performance >> issues and found threads like this one >> http://www.mail-archive.com/[EMAIL PROTECTED]/msg00320.html >> >> As you didn't really debug the speed of Thrift itself in your setup, >> this is one more variable in the problem. >> >> Also you don't really provide metrics about your system apart from >> requests/second. Would it be possible for you set them up using this >> guide? http://hbase.apache.org/metrics.html >> >> J-D >> >> On Thu, Apr 21, 2011 at 5:13 AM, Eran Kutner <eran@> wrote: >> > Hi J-D, >> > After stabilizing the configuration, with your great help, I was able >> > to go back to the the load tests. I tried using IRC, as you suggested, >> > to continue this discussion but because of the time difference (I'm >> > GMT+3) it is quite difficult to find a time when people are present >> > and I am available to run long tests, so I'll give the mailing list >> > one more try. >> > >> > I tested again on a clean table using 100 insert threads each, using a >> > separate keyspace within the test table. Every row had just one column >> > with 128 bytes of data. >> > With one server and one region I got about 2300 inserts per second. >> > After manually splitting the region I got about 3600 inserts per >> > second (still on one machine). After a while the regions were balanced >> > and one was moved to another server, that got writes to around 4500 >> > writes per second. Additional splits and moves to more servers didn't >> > improve this number and the write performance stabilized at ~4000 >> > writes/sec per server. This seems pretty low, especially considering >> > other numbers I've seen around here. >> > >> > Read performance is at around 1500 rows per second per server, which >> > seems extremely low to me, especially considering that all the working >> > set I was querying could fit in the servers memory. To make the test >> > interesting I limited my client to fetch only 1 row (always the same >> > one) from each keyspace, that yielded 10K reads per sec per server, so >> > I tried increasing the range again a read the same 10 rows, now the >> > performance dropped to 8500 reads/sec per server. Increasing the range >> > to 100 rows and the performance drops to around 3500 reads per second >> > per server. >> > Do you have any idea what could explain this behavior and how do I get >> > a decent number of reads from those servers? >> > >> > -eran > josh @schulz http://schulzone.org
-
Re: Performance test resultsEran Kutner 2011-04-27, 13:42
I must say the more I play with it the more baffled I am with the
results. I ran the read test again today after not touching the cluster for a couple of days and now I'm getting the same high read numbers (10-11K reads/sec per server with some server reaching even 15K r/s) if I read 1, 10, 100 or even 1000 rows from every key space, however 5000 rows yielded a read rate of only 3K rows per second, even after a very long time. Just to be clear I'm always random reading a single row in every request, the number of rows I'm talking about are the ranges of rows within each key space that I'm randomly selecting my keys from. St.Ack - to answer your questions: Writing from two machines increased the total number of writes per second by about 10%, maybe less. Reads showed 15-20% increase when ran from 2 machines. I already had most of the performance tuning recommendations implemented (garbage collection, using the new memory slabs feature, using LZO) when I ran my previous test, the only config I didn't have is "hbase.regionserver.handler.count", I changed it to 128, or 16 threads per core, which seems like a reasonable number and tried inserting to the same key ranges as before, it didn't seem to have made any difference in the total performance. My keys are about 15 bytes long. As for caching I can't find those cache hit ratio numbers in my logs, do they require a special parameter to enable them? That said, my calculations show that the entire data set I'm randomly reading should easily fit in the servers memory. Each row has 15 bytes of key + 128 bytes of data + overhead - let's say 200 bytes. If I'm reading 5000 rows from each key space and have a total of 100 key spaces that's 100*5000*200=100000000B=100MB. This is spread across 5 servers with 16GB of RAM, out of which 12.5GB are allocated to the region servers. -eran On Tue, Apr 26, 2011 at 21:57, Stack <[EMAIL PROTECTED]> wrote: > > On Thu, Apr 21, 2011 at 5:13 AM, Eran Kutner <[EMAIL PROTECTED]> wrote: > > I tested again on a clean table using 100 insert threads each, using a > > separate keyspace within the test table. Every row had just one column > > with 128 bytes of data. > > > > With one server and one region I got about 2300 inserts per second. > > After manually splitting the region I got about 3600 inserts per > > second (still on one machine). After a while the regions were balanced > > and one was moved to another server, that got writes to around 4500 > > writes per second. Additional splits and moves to more servers didn't > > improve this number and the write performance stabilized at ~4000 > > writes/sec per server. This seems pretty low, especially considering > > other numbers I've seen around here. > > > > If you run your insert process on more than one box, do the numbers change? > > Nothing in http://hbase.apache.org/book.html#performance helps? > > What size your keys? > > > > Read performance is at around 1500 rows per second per server, which > > seems extremely low to me, especially considering that all the working > > set I was querying could fit in the servers memory. To make the test > > interesting I limited my client to fetch only 1 row (always the same > > one) from each keyspace, that yielded 10K reads per sec per server, so > > I tried increasing the range again a read the same 10 rows, now the > > performance dropped to 8500 reads/sec per server. Increasing the range > > to 100 rows and the performance drops to around 3500 reads per second > > per server. > > > This result is interesting. The cache logs hit rate in the > regionserver logs. You seeing near 100% for 1 row, 10 row, and 100 > rows? > > St.Ack
-
Re: Performance test resultsEran Kutner 2011-04-27, 13:51
Hi Josh,
The connection pooling code is attached AS IS (with all the usual legal disclaimers), note that you will have to modify it a bit to get it to compile because it depends on some internal libraries we use. In particular, DynamicAppSettings and Log are two internal classes that do what their names imply :) Make sure you initialize "servers" in the NewConnection() method to an array with your Thrift servers and you should be good to go. You use GetConnection() to get a connection and ReturnConnection() to return it back to the pool after you finish using it - make sure you don't close it in the application code. -eran On Wed, Apr 27, 2011 at 00:30, Josh <[EMAIL PROTECTED]> wrote: > On Tue, Apr 26, 2011 at 3:34 AM, Eran Kutner <[EMAIL PROTECTED]> wrote: > > Hi J-D, > > I don't think it's a Thrift issue. First, I use the TBufferedTransport > > transport, second, I implemented my own connection pool so the same > > connections are reused over and over again, > > Hey! I'm using C#->Hbase and high on my list of things todo is > 'Implement Thrift Connection Pooling in C#'. You have any desire to > release that code? > > > > so there is no overhead > > for opening and closing connections (I've verified that using > > Wireshark), third, if it was a client capacity issue I would expect to > > see an increase in throughput as I add more threads or run the test on > > two servers in parallel, this doesn't seem to happen, the total > > capacity remains unchanged. > > > > As for metrics, I already have it configured and monitored using > > Zabbix, but it only monitors specific counters, so let me know what > > information you would like to see. The numbers I quoted before are > > based on client counters and correlated with server counters ("multi" > > for writes and "get" for reads). > > > > -eran > > > > > > > > On Thu, Apr 21, 2011 at 20:43, Jean-Daniel Cryans <[EMAIL PROTECTED]> > wrote: > >> > >> Hey Eran, > >> > >> Glad you could go back to debugging performance :) > >> > >> The scalability issues you are seeing are unknown to me, it sounds > >> like the client isn't pushing it enough. It reminded me of when we > >> switched to using the native Thrift PHP extension instead of the > >> "normal" one and we saw huge speedups. My limited knowledge of Thrift > >> may be blinding me, but I looked around for C# Thrift performance > >> issues and found threads like this one > >> http://www.mail-archive.com/[EMAIL PROTECTED]/msg00320.html > >> > >> As you didn't really debug the speed of Thrift itself in your setup, > >> this is one more variable in the problem. > >> > >> Also you don't really provide metrics about your system apart from > >> requests/second. Would it be possible for you set them up using this > >> guide? http://hbase.apache.org/metrics.html > >> > >> J-D > >> > >> On Thu, Apr 21, 2011 at 5:13 AM, Eran Kutner <eran@> wrote: > >> > Hi J-D, > >> > After stabilizing the configuration, with your great help, I was able > >> > to go back to the the load tests. I tried using IRC, as you suggested, > >> > to continue this discussion but because of the time difference (I'm > >> > GMT+3) it is quite difficult to find a time when people are present > >> > and I am available to run long tests, so I'll give the mailing list > >> > one more try. > >> > > >> > I tested again on a clean table using 100 insert threads each, using a > >> > separate keyspace within the test table. Every row had just one column > >> > with 128 bytes of data. > >> > With one server and one region I got about 2300 inserts per second. > >> > After manually splitting the region I got about 3600 inserts per > >> > second (still on one machine). After a while the regions were balanced > >> > and one was moved to another server, that got writes to around 4500 > >> > writes per second. Additional splits and moves to more servers didn't > >> > improve this number and the write performance stabilized at ~4000 > >> > writes/sec per server. This seems pretty low, especially considering
-
Re: Performance test resultsEran Kutner 2011-04-27, 15:31
Since the attachment didn't make it, here it is again:
http://shortText.com/jp73moaesx -eran On Wed, Apr 27, 2011 at 16:51, Eran Kutner <[EMAIL PROTECTED]> wrote: > Hi Josh, > > The connection pooling code is attached AS IS (with all the usual legal > disclaimers), note that you will have to modify it a bit to get it to > compile because it depends on some internal libraries we use. In particular, > DynamicAppSettings and Log are two internal classes that do what their names > imply :) > Make sure you initialize "servers" in the NewConnection() method to an array > with your Thrift servers and you should be good to go. You use > GetConnection() to get a connection and ReturnConnection() to return it back > to the pool after you finish using it - make sure you don't close it in the > application code. > > -eran > > > > On Wed, Apr 27, 2011 at 00:30, Josh <[EMAIL PROTECTED]> wrote: >> >> On Tue, Apr 26, 2011 at 3:34 AM, Eran Kutner <[EMAIL PROTECTED]> wrote: >> > Hi J-D, >> > I don't think it's a Thrift issue. First, I use the TBufferedTransport >> > transport, second, I implemented my own connection pool so the same >> > connections are reused over and over again, >> >> Hey! I'm using C#->Hbase and high on my list of things todo is >> 'Implement Thrift Connection Pooling in C#'. You have any desire to >> release that code? >> >> >> > so there is no overhead >> > for opening and closing connections (I've verified that using >> > Wireshark), third, if it was a client capacity issue I would expect to >> > see an increase in throughput as I add more threads or run the test on >> > two servers in parallel, this doesn't seem to happen, the total >> > capacity remains unchanged. >> > >> > As for metrics, I already have it configured and monitored using >> > Zabbix, but it only monitors specific counters, so let me know what >> > information you would like to see. The numbers I quoted before are >> > based on client counters and correlated with server counters ("multi" >> > for writes and "get" for reads). >> > >> > -eran >> > >> > >> > >> > On Thu, Apr 21, 2011 at 20:43, Jean-Daniel Cryans <[EMAIL PROTECTED]> >> > wrote: >> >> >> >> Hey Eran, >> >> >> >> Glad you could go back to debugging performance :) >> >> >> >> The scalability issues you are seeing are unknown to me, it sounds >> >> like the client isn't pushing it enough. It reminded me of when we >> >> switched to using the native Thrift PHP extension instead of the >> >> "normal" one and we saw huge speedups. My limited knowledge of Thrift >> >> may be blinding me, but I looked around for C# Thrift performance >> >> issues and found threads like this one >> >> http://www.mail-archive.com/[EMAIL PROTECTED]/msg00320.html >> >> >> >> As you didn't really debug the speed of Thrift itself in your setup, >> >> this is one more variable in the problem. >> >> >> >> Also you don't really provide metrics about your system apart from >> >> requests/second. Would it be possible for you set them up using this >> >> guide? http://hbase.apache.org/metrics.html >> >> >> >> J-D >> >> >> >> On Thu, Apr 21, 2011 at 5:13 AM, Eran Kutner <eran@> wrote: >> >> > Hi J-D, >> >> > After stabilizing the configuration, with your great help, I was able >> >> > to go back to the the load tests. I tried using IRC, as you >> >> > suggested, >> >> > to continue this discussion but because of the time difference (I'm >> >> > GMT+3) it is quite difficult to find a time when people are present >> >> > and I am available to run long tests, so I'll give the mailing list >> >> > one more try. >> >> > >> >> > I tested again on a clean table using 100 insert threads each, using >> >> > a >> >> > separate keyspace within the test table. Every row had just one >> >> > column >> >> > with 128 bytes of data. >> >> > With one server and one region I got about 2300 inserts per second. >> >> > After manually splitting the region I got about 3600 inserts per >> >> > second (still on one machine). After a while the regions were
-
Re: Performance test resultsJean-Daniel Cryans 2011-05-02, 19:14
It might be the slow memstore issue... after inserting your dataset
issue a flush on your table in the shell, wait a few seconds, then start reading. Someone else on the mailing list recently saw this type of issue. Regarding the block caching logging, here's what I see in my logs: 2011-05-02 10:05:38,718 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 303.77 MB of total=2.52 GB 2011-05-02 10:05:38,751 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=303.8 MB, total=2.22 GB, single=755.67 MB, multi=1.76 GB, memory=0 KB 2011-05-02 10:07:18,737 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=2.27 GB, free=718.03 MB, max=2.97 GB, blocks=36450, accesses=1056364760, hits=939002423, hitRatio=88.88%%, cachingAccesses=967172747, cachingHits=932095548, cachingHitsRatio=96.37%%, evictions=7801, evicted=35040749, evictedPerRun=4491.8276367187 Keep in mind that currently we don't have like a moving average for the percentages so at some point those numbers are set in stone... The handler config is only good if you are using a ton of clients, which doesn't seem to be the case (at least now). J-D On Wed, Apr 27, 2011 at 6:42 AM, Eran Kutner <eran@> wrote: > I must say the more I play with it the more baffled I am with the > results. I ran the read test again today after not touching the > cluster for a couple of days and now I'm getting the same high read > numbers (10-11K reads/sec per server with some server reaching even > 15K r/s) if I read 1, 10, 100 or even 1000 rows from every key space, > however 5000 rows yielded a read rate of only 3K rows per second, even > after a very long time. Just to be clear I'm always random reading a > single row in every request, the number of rows I'm talking about are > the ranges of rows within each key space that I'm randomly selecting > my keys from. > > St.Ack - to answer your questions: > > Writing from two machines increased the total number of writes per > second by about 10%, maybe less. Reads showed 15-20% increase when ran > from 2 machines. > > I already had most of the performance tuning recommendations > implemented (garbage collection, using the new memory slabs feature, > using LZO) when I ran my previous test, the only config I didn't have > is "hbase.regionserver.handler.count", I changed it to 128, or 16 > threads per core, which seems like a reasonable number and tried > inserting to the same key ranges as before, it didn't seem to have > made any difference in the total performance. > > My keys are about 15 bytes long. > > As for caching I can't find those cache hit ratio numbers in my logs, > do they require a special parameter to enable them? That said, my > calculations show that the entire data set I'm randomly reading should > easily fit in the servers memory. Each row has 15 bytes of key + 128 > bytes of data + overhead - let's say 200 bytes. If I'm reading 5000 > rows from each key space and have a total of 100 key spaces that's > 100*5000*200=100000000B=100MB. This is spread across 5 servers with > 16GB of RAM, out of which 12.5GB are allocated to the region servers. > > -eran
-
Re: Performance test resultsEran Kutner 2011-05-03, 13:20
Flushing, at least when I try it now, long after I stopped writing, doesn't
seem to have any effect. In my log I see this: 2011-05-03 08:57:55,384 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=3.39 GB, free=897.87 MB, max=4.27 GB, blocks=54637, accesses=89411811, hits=75769916, hitRatio=84.74%%, cachingAccesses=83656318, cachingHits=75714473, cachingHitsRatio=90.50%%, evictions=1135, evicted=7887205, evictedPerRun=6949.0791015625 and every 30 seconds or so something like this: 2011-05-03 08:58:07,900 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 436.92 MB of total=3.63 GB 2011-05-03 08:58:07,947 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction completed; freed=436.95 MB, total=3.2 GB, single=931.65 MB, multi=2.68 GB, memory=3.69 KB Now, if the entire working set I'm reading is 100MB in size, why would it have to evict 436MB just to get it filled back in 30 seconds? Also, what is a good value for hfile.block.cache.size (I have it now on .35) but with 12.5GB of RAM available for the region servers it seem I should be able to get it much higher. -eran On Mon, May 2, 2011 at 22:14, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: > It might be the slow memstore issue... after inserting your dataset > issue a flush on your table in the shell, wait a few seconds, then > start reading. Someone else on the mailing list recently saw this type > of issue. > > Regarding the block caching logging, here's what I see in my logs: > > 2011-05-02 10:05:38,718 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU > eviction started; Attempting to free 303.77 MB of total=2.52 GB > 2011-05-02 10:05:38,751 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU > eviction completed; freed=303.8 MB, total=2.22 GB, single=755.67 MB, > multi=1.76 GB, memory=0 KB > 2011-05-02 10:07:18,737 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=2.27 > GB, free=718.03 MB, max=2.97 GB, blocks=36450, accesses=1056364760, > hits=939002423, hitRatio=88.88%%, cachingAccesses=967172747, > cachingHits=932095548, cachingHitsRatio=96.37%%, evictions=7801, > evicted=35040749, evictedPerRun=4491.8276367187 > > Keep in mind that currently we don't have like a moving average for > the percentages so at some point those numbers are set in stone... > > The handler config is only good if you are using a ton of clients, > which doesn't seem to be the case (at least now). > > J-D > > On Wed, Apr 27, 2011 at 6:42 AM, Eran Kutner <eran@> wrote: >> I must say the more I play with it the more baffled I am with the >> results. I ran the read test again today after not touching the >> cluster for a couple of days and now I'm getting the same high read >> numbers (10-11K reads/sec per server with some server reaching even >> 15K r/s) if I read 1, 10, 100 or even 1000 rows from every key space, >> however 5000 rows yielded a read rate of only 3K rows per second, even >> after a very long time. Just to be clear I'm always random reading a >> single row in every request, the number of rows I'm talking about are >> the ranges of rows within each key space that I'm randomly selecting >> my keys from. >> >> St.Ack - to answer your questions: >> >> Writing from two machines increased the total number of writes per >> second by about 10%, maybe less. Reads showed 15-20% increase when ran >> from 2 machines. >> >> I already had most of the performance tuning recommendations >> implemented (garbage collection, using the new memory slabs feature, >> using LZO) when I ran my previous test, the only config I didn't have >> is "hbase.regionserver.handler.count", I changed it to 128, or 16 >> threads per core, which seems like a reasonable number and tried >> inserting to the same key ranges as before, it didn't seem to have >> made any difference in the total performance. >> >> My keys are about 15 bytes long. >> >> As for caching I can't find those cache hit ratio numbers in my logs,
-
Re: Performance test resultsJean-Daniel Cryans 2011-05-03, 20:29
On Tue, May 3, 2011 at 6:20 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> Flushing, at least when I try it now, long after I stopped writing, doesn't > seem to have any effect. Bummer. > > In my log I see this: > 2011-05-03 08:57:55,384 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=3.39 GB, > free=897.87 MB, max=4.27 GB, blocks=54637, accesses=89411811, hits=75769916, > hitRatio=84.74%%, cachingAccesses=83656318, cachingHits=75714473, > cachingHitsRatio=90.50%%, evictions=1135, evicted=7887205, > evictedPerRun=6949.0791015625 > > and every 30 seconds or so something like this: > 2011-05-03 08:58:07,900 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction > started; Attempting to free 436.92 MB of total=3.63 GB > 2011-05-03 08:58:07,947 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction > completed; freed=436.95 MB, total=3.2 GB, single=931.65 MB, multi=2.68 GB, > memory=3.69 KB > > Now, if the entire working set I'm reading is 100MB in size, why would it > have to evict 436MB just to get it filled back in 30 seconds? I was about to ask the same question... from what I can tell from the this log, it seems that your working dataset is much larger than 3GB (the fact that it's evicting means it could be a lot more) and that's only on that region server. First reason that comes in mind on why it would be so much bigger is that you would have uploaded your dataset more than once and since HBase keeps versions of the data, it could accumulate. That doesn't explain how it would grow into GBs since by default a family only keeps 3 versions... unless you set that higher than the default or you uploaded the same data tens of times within 24 hours and the major compactions didn't kick in. In any case, it would be interesting that you: - truncate the table - re-import the data - force a flush - wait a bit until the flushes are done (should take 2-3 seconds if your dataset is really 100MB) - do a "hadoop dfs -dus" on the table's directory (should be under/hbase) - if the number is way out of whack, review how you are inserting your data. Either way, please report back. > > Also, what is a good value for hfile.block.cache.size (I have it now on .35) > but with 12.5GB of RAM available for the region servers it seem I should be > able to get it much higher. Depends, you also have to account for the MemStores which by default can use up to 40% of the heap (hbase.regionserver.global.memstore.upperLimit) leaving currently for you only 100-40-35=25% of the heap to do stuff like serving requests, compacting, flushing, etc. It's hard to give a good number for what should be left to the rest of HBase tho...
-
Re: Performance test resultsEran Kutner 2011-05-04, 12:20
J-D,
I'll try what you suggest but it is worth pointing out that my data set has over 300M rows, however in my read test I am random reading out of a subset that contains only 0.5M rows (5000 rows in each of the 100 key ranges in the table). -eran On Tue, May 3, 2011 at 23:29, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote: > On Tue, May 3, 2011 at 6:20 AM, Eran Kutner <[EMAIL PROTECTED]> wrote: > > Flushing, at least when I try it now, long after I stopped writing, > doesn't > > seem to have any effect. > > Bummer. > > > > > In my log I see this: > > 2011-05-03 08:57:55,384 DEBUG > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=3.39 GB, > > free=897.87 MB, max=4.27 GB, blocks=54637, accesses=89411811, > hits=75769916, > > hitRatio=84.74%%, cachingAccesses=83656318, cachingHits=75714473, > > cachingHitsRatio=90.50%%, evictions=1135, evicted=7887205, > > evictedPerRun=6949.0791015625 > > > > and every 30 seconds or so something like this: > > 2011-05-03 08:58:07,900 DEBUG > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction > > started; Attempting to free 436.92 MB of total=3.63 GB > > 2011-05-03 08:58:07,947 DEBUG > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction > > completed; freed=436.95 MB, total=3.2 GB, single=931.65 MB, multi=2.68 > GB, > > memory=3.69 KB > > > > Now, if the entire working set I'm reading is 100MB in size, why would it > > have to evict 436MB just to get it filled back in 30 seconds? > > I was about to ask the same question... from what I can tell from the > this log, it seems that your working dataset is much larger than 3GB > (the fact that it's evicting means it could be a lot more) and that's > only on that region server. > > First reason that comes in mind on why it would be so much bigger is > that you would have uploaded your dataset more than once and since > HBase keeps versions of the data, it could accumulate. That doesn't > explain how it would grow into GBs since by default a family only > keeps 3 versions... unless you set that higher than the default or you > uploaded the same data tens of times within 24 hours and the major > compactions didn't kick in. > > In any case, it would be interesting that you: > > - truncate the table > - re-import the data > - force a flush > - wait a bit until the flushes are done (should take 2-3 seconds if > your dataset is really 100MB) > - do a "hadoop dfs -dus" on the table's directory (should be under/hbase) > - if the number is way out of whack, review how you are inserting > your data. Either way, please report back. > > > > > Also, what is a good value for hfile.block.cache.size (I have it now on > .35) > > but with 12.5GB of RAM available for the region servers it seem I should > be > > able to get it much higher. > > Depends, you also have to account for the MemStores which by default > can use up to 40% of the heap > (hbase.regionserver.global.memstore.upperLimit) leaving currently for > you only 100-40-35=25% of the heap to do stuff like serving requests, > compacting, flushing, etc. It's hard to give a good number for what > should be left to the rest of HBase tho... >
-
Re: Performance test resultsEran Kutner 2011-05-09, 16:31
OK, I tried it, truncated the table and ran inserts for about a day. Now I
tried flushing the table but I get a "Region is not online" error, although all the servers are up, no regions are in transition and as far as I can tell all the regions seem up. I can even read rows which are supposedly in the offline region (I'm assuming the region name indicates the first key in the region). -eran On Wed, May 4, 2011 at 15:20, Eran Kutner <[EMAIL PROTECTED]> wrote: > J-D, > I'll try what you suggest but it is worth pointing out that my data set has > over 300M rows, however in my read test I am random reading out of a subset > that contains only 0.5M rows (5000 rows in each of the 100 key ranges in the > table). > > -eran > > > > On Tue, May 3, 2011 at 23:29, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote: > >> On Tue, May 3, 2011 at 6:20 AM, Eran Kutner <[EMAIL PROTECTED]> wrote: >> > Flushing, at least when I try it now, long after I stopped writing, >> doesn't >> > seem to have any effect. >> >> Bummer. >> >> > >> > In my log I see this: >> > 2011-05-03 08:57:55,384 DEBUG >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=3.39 >> GB, >> > free=897.87 MB, max=4.27 GB, blocks=54637, accesses=89411811, >> hits=75769916, >> > hitRatio=84.74%%, cachingAccesses=83656318, cachingHits=75714473, >> > cachingHitsRatio=90.50%%, evictions=1135, evicted=7887205, >> > evictedPerRun=6949.0791015625 >> > >> > and every 30 seconds or so something like this: >> > 2011-05-03 08:58:07,900 DEBUG >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction >> > started; Attempting to free 436.92 MB of total=3.63 GB >> > 2011-05-03 08:58:07,947 DEBUG >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction >> > completed; freed=436.95 MB, total=3.2 GB, single=931.65 MB, multi=2.68 >> GB, >> > memory=3.69 KB >> > >> > Now, if the entire working set I'm reading is 100MB in size, why would >> it >> > have to evict 436MB just to get it filled back in 30 seconds? >> >> I was about to ask the same question... from what I can tell from the >> this log, it seems that your working dataset is much larger than 3GB >> (the fact that it's evicting means it could be a lot more) and that's >> only on that region server. >> >> First reason that comes in mind on why it would be so much bigger is >> that you would have uploaded your dataset more than once and since >> HBase keeps versions of the data, it could accumulate. That doesn't >> explain how it would grow into GBs since by default a family only >> keeps 3 versions... unless you set that higher than the default or you >> uploaded the same data tens of times within 24 hours and the major >> compactions didn't kick in. >> >> In any case, it would be interesting that you: >> >> - truncate the table >> - re-import the data >> - force a flush >> - wait a bit until the flushes are done (should take 2-3 seconds if >> your dataset is really 100MB) >> - do a "hadoop dfs -dus" on the table's directory (should be under/hbase) >> - if the number is way out of whack, review how you are inserting >> your data. Either way, please report back. >> >> > >> > Also, what is a good value for hfile.block.cache.size (I have it now on >> .35) >> > but with 12.5GB of RAM available for the region servers it seem I should >> be >> > able to get it much higher. >> >> Depends, you also have to account for the MemStores which by default >> can use up to 40% of the heap >> (hbase.regionserver.global.memstore.upperLimit) leaving currently for >> you only 100-40-35=25% of the heap to do stuff like serving requests, >> compacting, flushing, etc. It's hard to give a good number for what >> should be left to the rest of HBase tho... >> > >
-
Re: Performance test resultsStack 2011-05-09, 17:03
On Mon, May 9, 2011 at 9:31 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> OK, I tried it, truncated the table and ran inserts for about a day. Now I > tried flushing the table but I get a "Region is not online" error, although > all the servers are up, no regions are in transition and as far as I can > tell all the regions seem up. You will get this message if you incorrectly specified the regionname. Is that possible? >I can even read rows which are supposedly in > the offline region (I'm assuming the region name indicates the first key in > the region). > The middle portion of the regionname is indeed its startkey. Scan '.META.' in shell and it will dump out info that includes start and end keys. St.Ack
-
Re: Performance test resultsEran Kutner 2011-05-09, 20:41
I tried flushing the table, not a specific region.
-eran On Mon, May 9, 2011 at 20:03, Stack <[EMAIL PROTECTED]> wrote: > On Mon, May 9, 2011 at 9:31 AM, Eran Kutner <[EMAIL PROTECTED]> wrote: > > OK, I tried it, truncated the table and ran inserts for about a day. Now > I > > tried flushing the table but I get a "Region is not online" error, > although > > all the servers are up, no regions are in transition and as far as I can > > tell all the regions seem up. > > You will get this message if you incorrectly specified the regionname. > Is that possible? > > >I can even read rows which are supposedly in > > the offline region (I'm assuming the region name indicates the first key > in > > the region). > > > > The middle portion of the regionname is indeed its startkey. Scan > '.META.' in shell and it will dump out info that includes start and > end keys. > > St.Ack > |