-Re: tserver side parallelism
Josh Elser 2014-02-07, 20:33
The tserver.readahead.concurrent.max property provides an upper-bound on
the number of scans that will start "reading ahead". This read-ahead is
a performance tweak that tries to smooth the I/O cost of reading from
files. However, each readahead thread does increase the amount of heap
used as the data that was read is stored in memory. This parameter lets
you provide a maximum amount of space that will be used by readahead
across *all* scan tasks (from a Scanner, BatchScanner or even major
compactions) for a tablet server.
The tserver.scan.files.open.max property provides you with control over
the upper-bound of the number of files for scanning that a tablet server
(across all tablets hosted by that tablet server) can open. Again, as
holding these files open, this parameter is meant to allow you to place
an upper bound on the memory consumption used by opening files.
Now, the number of threads that a batchscanner uses is what's primarily
going to control your "server side parallelism". When you provide a
value of N to the batchscanner "threads", you will get up to N "scan
tasks" running concurrently against your Accumulo instance. The two
previously described properties will only act to limit the number of
resources that your single batchscanner (in the view of all active
batchscanners) can consume.
In situations with multiple clients reading from an Accumulo instance,
you may run into cases where a scan task (one thread from your
BatchScanner) is blocked until the tabletserver finishes a previous read
and thus frees additional resources (number of open files or readahead
threads) to satisfy your scan request.
Hope that helps.
On 2/7/14, 3:19 PM, Anthony F wrote: