|
|
Eric Hauser 2011-07-29, 16:57
Hi, I've been doing different experiments with a 5-node cluster with YCSB. We have been testing a number of different configurations, so I have been constantly been wiping our cluster up and setting it up again since we configure everything via Chef. At one point, I was able to get the following stats from our cluster which I was pretty happy with: YCSB Client 0.1
Command line: -load -db com.yahoo.ycsb.db.HBaseClient -Pworkloads/workloada -p columnfamily=family -p recordcount=10000000 -s
[OVERALL], RunTime(ms), 1057645.0
[OVERALL], Throughput(ops/sec), 9454.96834949345
[INSERT], Operations, 10000000
[INSERT], AverageLatency(ms), 0.0915235
[INSERT], MinLatency(ms), 0
[INSERT], MaxLatency(ms), 6925
[INSERT], 95thPercentileLatency(ms), 0
[INSERT], 99thPercentileLatency(ms), 0
[INSERT], Return=0, 10000000
However, in our most recent server builds, I seem to very quickly deadlock something in HBase. I've backed through all our old revisions and reverted a number of different configuration settings, and I can't seem to figure out now why the cluster is so slow. Our terasort M/R tests are returning the same values as before, so I do not believe that there is anything wrong external to HBase.
The behavior that I see when I kick off the tests is this:
[UPDATE], 0, 4765
[UPDATE], 1, 248
[UPDATE], 2, 0
[UPDATE], 3, 0
[UPDATE], 4, 0
Basically, it kicks off a large number of inserts and HBase grinds to a halt. Some number of the writes end up getting inserted (usually around ~50), but then everything stops. Here's the behavior I see with the region servers:
npin-172-16-12-203.np.local:60030 1311956094792 requests=50, regions=1, usedHeap=151, maxHeap=16358 npin-172-16-12-204.np.local:60030 1311956094776 requests=5, regions=2, usedHeap=157, maxHeap=16358 npin-172-16-12-205.np.local:60030 1311956093804 requests=0, regions=0, usedHeap=134, maxHeap=16358 npin-172-16-12-206.np.local:60030 1311956093809 requests=0, regions=0, usedHeap=134, maxHeap=16358 npin-172-16-12-207.np.local:60030 1311956094799 requests=0, regions=0, usedHeap=134, maxHeap=16358 Total: servers: 5 requests=55, regions=3
I did thread dumps on both the masters and region servers during this time and did not see anything interesting. I'm using 0.90.3-CDH3U1. Anyone have a suggestion on where to look next?
Jeff Whiting 2011-07-29, 17:04
Check the region server logs. If they are blocking on something it should show up there. For cdh3 the logs are in /var/log/hbase/. Also you may want to turn on debug level for your logging (either in log4j or in the web interface). Finally all of your requests are going to just one region server...npin-172-16-12-204.np.local...so it may be stuck trying to split a region or something. You could try to pre-split the regions which may help.
~Jeff
On 7/29/2011 10:57 AM, Eric Hauser wrote: > Hi, > I've been doing different experiments with a 5-node cluster with YCSB. > We have been testing a number of different configurations, so I have > been constantly been wiping our cluster up and setting it up again > since we configure everything via Chef. At one point, I was able to > get the following stats from our cluster which I was pretty happy > with: > YCSB Client 0.1 > > Command line: -load -db com.yahoo.ycsb.db.HBaseClient > -Pworkloads/workloada -p columnfamily=family -p recordcount=10000000 > -s > > [OVERALL], RunTime(ms), 1057645.0 > > [OVERALL], Throughput(ops/sec), 9454.96834949345 > > [INSERT], Operations, 10000000 > > [INSERT], AverageLatency(ms), 0.0915235 > > [INSERT], MinLatency(ms), 0 > > [INSERT], MaxLatency(ms), 6925 > > [INSERT], 95thPercentileLatency(ms), 0 > > [INSERT], 99thPercentileLatency(ms), 0 > > [INSERT], Return=0, 10000000 > > However, in our most recent server builds, I seem to very quickly > deadlock something in HBase. I've backed through all our old > revisions and reverted a number of different configuration settings, > and I can't seem to figure out now why the cluster is so slow. Our > terasort M/R tests are returning the same values as before, so I do > not believe that there is anything wrong external to HBase. > > The behavior that I see when I kick off the tests is this: > > [UPDATE], 0, 4765 > > [UPDATE], 1, 248 > > [UPDATE], 2, 0 > > [UPDATE], 3, 0 > > [UPDATE], 4, 0 > > Basically, it kicks off a large number of inserts and HBase grinds to > a halt. Some number of the writes end up getting inserted (usually > around ~50), but then everything stops. Here's the behavior I see > with the region servers: > > npin-172-16-12-203.np.local:60030 1311956094792 requests=50, > regions=1, usedHeap=151, maxHeap=16358 > npin-172-16-12-204.np.local:60030 1311956094776 requests=5, regions=2, > usedHeap=157, maxHeap=16358 > npin-172-16-12-205.np.local:60030 1311956093804 requests=0, regions=0, > usedHeap=134, maxHeap=16358 > npin-172-16-12-206.np.local:60030 1311956093809 requests=0, regions=0, > usedHeap=134, maxHeap=16358 > npin-172-16-12-207.np.local:60030 1311956094799 requests=0, regions=0, > usedHeap=134, maxHeap=16358 > Total: servers: 5 requests=55, regions=3 > > I did thread dumps on both the masters and region servers during this > time and did not see anything interesting. I'm using 0.90.3-CDH3U1. > Anyone have a suggestion on where to look next?
-- Jeff Whiting Qualtrics Senior Software Engineer [EMAIL PROTECTED]
Gary Helmling 2011-07-29, 17:07
Is it possible that you have mismatched versions of either the hbase jar or hadoop jar on the ycsb client versus the servers? In almost all cases where I've run into mysterious rpc hangs right off the bat it's been attributable to forgetting to update a jar file or an older version still being present in the classpath.
If all of that checks out ok, you can enable rpc logging by adding the following to log4j.properties on both the client and the server:
log4j.logger.org.apache.hadoop.ipc=DEBUG
This will produce a lot of output, but should make it easier to track what's going on.
--gh
On Fri, Jul 29, 2011 at 9:57 AM, Eric Hauser <[EMAIL PROTECTED]> wrote:
> Hi, > I've been doing different experiments with a 5-node cluster with YCSB. > We have been testing a number of different configurations, so I have > been constantly been wiping our cluster up and setting it up again > since we configure everything via Chef. At one point, I was able to > get the following stats from our cluster which I was pretty happy > with: > YCSB Client 0.1 > > Command line: -load -db com.yahoo.ycsb.db.HBaseClient > -Pworkloads/workloada -p columnfamily=family -p recordcount=10000000 > -s > > [OVERALL], RunTime(ms), 1057645.0 > > [OVERALL], Throughput(ops/sec), 9454.96834949345 > > [INSERT], Operations, 10000000 > > [INSERT], AverageLatency(ms), 0.0915235 > > [INSERT], MinLatency(ms), 0 > > [INSERT], MaxLatency(ms), 6925 > > [INSERT], 95thPercentileLatency(ms), 0 > > [INSERT], 99thPercentileLatency(ms), 0 > > [INSERT], Return=0, 10000000 > > However, in our most recent server builds, I seem to very quickly > deadlock something in HBase. I've backed through all our old > revisions and reverted a number of different configuration settings, > and I can't seem to figure out now why the cluster is so slow. Our > terasort M/R tests are returning the same values as before, so I do > not believe that there is anything wrong external to HBase. > > The behavior that I see when I kick off the tests is this: > > [UPDATE], 0, 4765 > > [UPDATE], 1, 248 > > [UPDATE], 2, 0 > > [UPDATE], 3, 0 > > [UPDATE], 4, 0 > > Basically, it kicks off a large number of inserts and HBase grinds to > a halt. Some number of the writes end up getting inserted (usually > around ~50), but then everything stops. Here's the behavior I see > with the region servers: > > npin-172-16-12-203.np.local:60030 1311956094792 requests=50, > regions=1, usedHeap=151, maxHeap=16358 > npin-172-16-12-204.np.local:60030 1311956094776 requests=5, > regions=2, > usedHeap=157, maxHeap=16358 > npin-172-16-12-205.np.local:60030 1311956093804 requests=0, > regions=0, > usedHeap=134, maxHeap=16358 > npin-172-16-12-206.np.local:60030 1311956093809 requests=0, > regions=0, > usedHeap=134, maxHeap=16358 > npin-172-16-12-207.np.local:60030 1311956094799 requests=0, > regions=0, > usedHeap=134, maxHeap=16358 > Total: servers: 5 requests=55, regions=3 > > I did thread dumps on both the masters and region servers during this > time and did not see anything interesting. I'm using 0.90.3-CDH3U1. > Anyone have a suggestion on where to look next? >
registration@... 2012-07-09, 19:17
Now that I have a stable cluster, I would like to use YCSB to test its performance; however, I am a bit confused after reading several different website posting about YCSB.
1) Be default will YCSB read my hbase-site.xml file or do I have to copy it into the YCSB conf directory? I plan on using on of my nodes with no Hadoop/HBASE processes running on it, but it has all the environmental stuff in place.
2) Does the hbase.master property have to be site in the hbase-site.xml file for YCSB to work?
3) After working through all the workloads is there a script/tool that will clean up my HBase?
Thank You
---
Jay Wilson
|
|