Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> increase "running scans" in monitor?


+
Marc Reichman 2013-04-02, 14:56
Copy link to this message
-
Re: increase "running scans" in monitor?
Hi Marc,

How many tablets are in the table you're running MR over (see the
monitor)? Might adding some more splits to your table (`addsplits` in
the Accumulo shell) get you better parallelism?

What does your data look like in your table? Lots of small rows? Few
very large rows?

On 4/2/13 10:56 AM, Marc Reichman wrote:
> Hello,
>
> I am running a accumulo-based MR job using the AccumuloRowInputFormat
> on 1.4.1. Config is more-or-less default, using the native-standalone
> 3GB template, but with the TServer memory put up to 2GB in
> accumulo-env.sh from its default. accumulo-site.xml has
> tserver.memory.maps.max at 1G, tserver.cache.data.size at 50M, and
> tserver.cache.index.size at 512M.
>
> My tables are created with maxversions for all three types (scan,
> minc, majc) at 1 and compress type as gz.
>
> I am finding, on an 8 node test cluster with 64 map task slots, that
> when a job is running, the 'Running Scans' count in the monitor is
> roughly 0-4 on average for each tablet server. When viewed at the
> table view, this puts the running scans anywhere from 4-24 on average.
> I would expect/hope the scans to be somewhere close to the map task
> count. To me, this means one of the following.
> 1. There is a configuration setting inhibiting the amount of scans
> from accumulating (excuse the pun) to about the same amount as my map
> tasks
> 2. My map task job is cpu-intensive enough to introduce delays between
> scans and everything is fine
> 3. Some combination of 1/2.
>
> On an alternate cluster, 40 nodes with 320 task slots, we haven't seen
> anywhere near full capacity scanning with map tasks which have the
> same performance, and the problem seems much worse.
>
> I am experimenting with some of the readahead configuration variables
> for the tablet servers in the meantime, but haven't found any smoking
> guns yet.
>
> Thank you,
> Marc
>
>
> --
> http://saucyandbossy.wordpress.com
+
Marc Reichman 2013-04-02, 15:20
+
Marc Reichman 2013-04-02, 15:35
+
Keith Turner 2013-04-04, 01:15
+
Marc Reichman 2013-04-04, 14:47
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB