I have run into a problem related to ACCUMULO-1833, which appears to have addressed the issue for MutliTableBatchWriter; however I am seeing this issue on the scanner side also:
394750-"http-/192.168.220.196:8080-35" daemon prio=10 tid=0x00007f3108038000 nid=0x538a waiting for monitor entry [0x00007f31287d1000]
394878: java.lang.Thread.State: BLOCKED (on object monitor)
394933- at org.apache.accumulo.fate.zookeeper.ZooCache.getInstance(ZooCache.java:301)
395012- - waiting to lock <0x00000000fa64f5b8> (a java.lang.Class for org.apache.accumulo.fate.zookeeper.ZooCache)
395120- at org.apache.accumulo.core.client.impl.Tables.getZooCache(Tables.java:40)
395196- at org.apache.accumulo.core.client.impl.Tables.getMap(Tables.java:44)
395267- at org.apache.accumulo.core.client.impl.Tables.getNameToIdMap(Tables.java:78)
395346- at org.apache.accumulo.core.client.impl.Tables.getTableId(Tables.java:64)
395421- at org.apache.accumulo.core.client.impl.ConnectorImpl.getTableId(ConnectorImpl.java:75)
395510- at org.apache.accumulo.core.client.impl.ConnectorImpl.createScanner(ConnectorImpl.java:137)
I have not spent enough time reasoning about the code to understand all of the nuances but I am interested in knowing if there are any mitigating strategies for dealing with this thread contention e.g. would creating a cache entry for each member of the Zookeeper ensemble help relieve the strain? use multiple classloaders? or is my only option to spawn multiple JVMs?
Yep, you'll likely also block on BatchScanner, anything in TableOperations, and a host of other things.
For scanners, there's likely a standing recommendation to amortize the use of those objects (if you want to look up 5 range, don't make 5 scanners).
Creating a cache per member in the work would likely require some kind of paxos implementation to provide consistency which is highly undesirable.
One thing I'm curious about is the impact of removing ZooCache altogether from things like the client api and see what happens. I don't have a good way to measure that impact off the top of my head though.
Anyways, is this causing you problems in your usage of the api? Could you elaborate a bit more on the specifics? On Feb 12, 2014 4:48 AM, "Ariel Valentin" <[EMAIL PROTECTED]> wrote:
The ZooCache instance that's used *typically* comes from the Instance object that your Connector was created from. In other words, if you create multiple Instances (ZooKeeperInstance, usually), you can get multiple ZooCaches which means that concurrent calls to methods off of those objects should not block one another (createScanner off of connector1 from instance1 should not block createScanner off of connector2 from instance2).
That should be something quick you can play with if you so desire.
FWIW you can probably avoid the scan by making your insert idempotent aside from the timestamp and let versioning handle deduplication. On Wed, Feb 12, 2014 at 1:19 PM, Ariel Valentin <[EMAIL PROTECTED]>wrote:
The symptom is that we hit a point where a single server seems "unresponsive" but we do not see anything unusual going on in that machine and it seems idol. No heavy CPU, no I/O wait, low load average; however when we add additional instances of the JVM our capacity seems to increase linearly.
Based on thread dumps and profiler stats it appears that under "heavy" load most of our threads are blocked trying to access ZooCache. Ariel Valentin e-mail: [EMAIL PROTECTED] website: http://blog.arielvalentin.com skype: ariel.s.valentin twitter: arielvalentin linkedin: http://www.linkedin.com/profile/view?id=8996534 *simplicity *communication *feedback *courage *respect On Wed, Feb 12, 2014 at 1:41 PM, Josh Elser <[EMAIL PROTECTED]> wrote:
Also, for completeness: I filed ACCUMULO-2362 to work on concurrent accesses to the same instance in the same JVM.
Also, I misspoke earlier: much of the lock contention comes out of the Tables class, not from the Instance. ZooCache keeps a static map of instance to ZooCache which are used by a wide breadth of API calls.
We experimented with 1.5.1 today; our load test numbers seem to indicate a 10x performance improvement over 1.5.0 on a single JVM. We are running additional experiments over the next few days to see what happens when we move to multiple JVMs. Stay tuned.
Thanks, Ariel Sent from my mobile device. Please excuse any errors.
Apache Lucene, Apache Solr and all other Apache Software Foundation project and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. This site and Sematext Group is in no way affiliated with Elasticsearch BV.
Service operated by Sematext