Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HBase / YCSB


Copy link to this message
-
Re: HBase / YCSB
Gary Helmling 2011-07-29, 17:07
Is it possible that you have mismatched versions of either the hbase jar or
hadoop jar on the ycsb client versus the servers? In almost all cases where
I've run into mysterious rpc hangs right off the bat it's been attributable
to forgetting to update a jar file or an older version still being present
in the classpath.

If all of that checks out ok, you can enable rpc logging by adding the
following to log4j.properties on both the client and the server:

log4j.logger.org.apache.hadoop.ipc=DEBUG

This will produce a lot of output, but should make it easier to track what's
going on.

--gh

On Fri, Jul 29, 2011 at 9:57 AM, Eric Hauser <[EMAIL PROTECTED]> wrote:

> Hi,
> I've been doing different experiments with a 5-node cluster with YCSB.
>  We have been testing a number of different configurations, so I have
> been constantly been wiping our cluster up and setting it up again
> since we configure everything via Chef.   At one point, I was able to
> get the following stats from our cluster which I was pretty happy
> with:
> YCSB Client 0.1
>
> Command line: -load -db com.yahoo.ycsb.db.HBaseClient
> -Pworkloads/workloada -p columnfamily=family -p recordcount=10000000
> -s
>
> [OVERALL], RunTime(ms), 1057645.0
>
> [OVERALL], Throughput(ops/sec), 9454.96834949345
>
> [INSERT], Operations, 10000000
>
> [INSERT], AverageLatency(ms), 0.0915235
>
> [INSERT], MinLatency(ms), 0
>
> [INSERT], MaxLatency(ms), 6925
>
> [INSERT], 95thPercentileLatency(ms), 0
>
> [INSERT], 99thPercentileLatency(ms), 0
>
> [INSERT], Return=0, 10000000
>
> However, in our most recent server builds, I seem to very quickly
> deadlock something in HBase.  I've backed through all our old
> revisions and reverted a number of different configuration settings,
> and I can't seem to figure out now why the cluster is so slow.  Our
> terasort M/R tests are returning the same values as before, so I do
> not believe that there is anything wrong external to HBase.
>
> The behavior that I see when I kick off the tests is this:
>
> [UPDATE], 0, 4765
>
> [UPDATE], 1, 248
>
> [UPDATE], 2, 0
>
> [UPDATE], 3, 0
>
> [UPDATE], 4, 0
>
> Basically, it kicks off a large number of inserts and HBase grinds to
> a halt.  Some number of the writes end up getting inserted (usually
> around ~50), but then everything stops.  Here's the behavior I see
> with the region servers:
>
> npin-172-16-12-203.np.local:60030       1311956094792   requests=50,
> regions=1, usedHeap=151, maxHeap=16358
> npin-172-16-12-204.np.local:60030       1311956094776   requests=5,
> regions=2,
> usedHeap=157, maxHeap=16358
> npin-172-16-12-205.np.local:60030       1311956093804   requests=0,
> regions=0,
> usedHeap=134, maxHeap=16358
> npin-172-16-12-206.np.local:60030       1311956093809   requests=0,
> regions=0,
> usedHeap=134, maxHeap=16358
> npin-172-16-12-207.np.local:60030       1311956094799   requests=0,
> regions=0,
> usedHeap=134, maxHeap=16358
> Total:  servers: 5              requests=55, regions=3
>
> I did thread dumps on both the masters and region servers during this
> time and did not see anything interesting. I'm using 0.90.3-CDH3U1.
> Anyone have a suggestion on where to look next?
>