Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> increase "running scans" in monitor?


Copy link to this message
-
increase "running scans" in monitor?
Hello,

I am running a accumulo-based MR job using the AccumuloRowInputFormat on
1.4.1. Config is more-or-less default, using the native-standalone 3GB
template, but with the TServer memory put up to 2GB in accumulo-env.sh from
its default. accumulo-site.xml has tserver.memory.maps.max at 1G,
tserver.cache.data.size at 50M, and tserver.cache.index.size at 512M.

My tables are created with maxversions for all three types (scan, minc,
majc) at 1 and compress type as gz.

I am finding, on an 8 node test cluster with 64 map task slots, that when a
job is running, the 'Running Scans' count in the monitor is roughly 0-4 on
average for each tablet server. When viewed at the table view, this puts
the running scans anywhere from 4-24 on average. I would expect/hope the
scans to be somewhere close to the map task count. To me, this means one of
the following.
1. There is a configuration setting inhibiting the amount of scans from
accumulating (excuse the pun) to about the same amount as my map tasks
2. My map task job is cpu-intensive enough to introduce delays between
scans and everything is fine
3. Some combination of 1/2.

On an alternate cluster, 40 nodes with 320 task slots, we haven't seen
anywhere near full capacity scanning with map tasks which have the same
performance, and the problem seems much worse.

I am experimenting with some of the readahead configuration variables for
the tablet servers in the meantime, but haven't found any smoking guns yet.

Thank you,
Marc
--
http://saucyandbossy.wordpress.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB