-Re: Batchscanner and Tablet Memory
Keith Turner 2013-03-15, 19:23
On Fri, Mar 15, 2013 at 3:08 PM, Slater, David M.
<[EMAIL PROTECTED]> wrote:
> Hi again,
> I am curious as to how Accumulo handles multiple threads in a Batchscanner,
> and what its ramifications are for memory use on a node.
> Let’s say I start a Batchscanner with 20 threads, and scan across the entire
> range of rows in a table of 80 tablets, spread across 4 nodes. Will the
> Batchscanner try to spin off 20 threads if possible, or will it try to match
> it to the number of nodes? Should I try to match the number of threads with
> the number of cores that will be working on the data?
When the batch scanner has more threads than nodes, it will run
multiple scans on each node. It will only do this for nodes where it
has multiple tablets to scan. So in your example I think it may run
20/4=5 scans on each node. Each scan would access 80/20=4 tablets.
> When a thread is spun off, my thinking is that the tablet that the thread is
> spun off on will move the entire tablet to memory, and then the tablet will
> be iterated through. Is this how it typically happens (or is there possibly
> multiple threads on the same tablet)? If so, do I have to worry about memory
> issues if, say, one of the nodes tries to move 10 tablets into memory, but
> doesn’t have 20 GB of RAM left to store it?
Entire tablets are not loaded into memory when you scan a tablet.
Tablets are composed of rfiles. RFiles are composed of blocks of key
values. So only a few of these key/blocks from rfiles are loaded at
any given time. It possible that these RFile blocks may be cached in
the tablet server process depending on your configuration.
Multiple threads can scan a tablet concurrently.
> Sorry for the vagueness of the questions, but I’m trying to understand how
> the general process works under the covers, in order to diagnose some
> performance issues I have been having.