Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - increase "running scans" in monitor?


Copy link to this message
-
increase "running scans" in monitor?
Marc Reichman 2013-04-02, 14:56
Hello,

I am running a accumulo-based MR job using the AccumuloRowInputFormat on
1.4.1. Config is more-or-less default, using the native-standalone 3GB
template, but with the TServer memory put up to 2GB in accumulo-env.sh from
its default. accumulo-site.xml has tserver.memory.maps.max at 1G,
tserver.cache.data.size at 50M, and tserver.cache.index.size at 512M.

My tables are created with maxversions for all three types (scan, minc,
majc) at 1 and compress type as gz.

I am finding, on an 8 node test cluster with 64 map task slots, that when a
job is running, the 'Running Scans' count in the monitor is roughly 0-4 on
average for each tablet server. When viewed at the table view, this puts
the running scans anywhere from 4-24 on average. I would expect/hope the
scans to be somewhere close to the map task count. To me, this means one of
the following.
1. There is a configuration setting inhibiting the amount of scans from
accumulating (excuse the pun) to about the same amount as my map tasks
2. My map task job is cpu-intensive enough to introduce delays between
scans and everything is fine
3. Some combination of 1/2.

On an alternate cluster, 40 nodes with 320 task slots, we haven't seen
anywhere near full capacity scanning with map tasks which have the same
performance, and the problem seems much worse.

I am experimenting with some of the readahead configuration variables for
the tablet servers in the meantime, but haven't found any smoking guns yet.

Thank you,
Marc
--
http://saucyandbossy.wordpress.com