I am curious as to how Accumulo handles multiple threads in a Batchscanner, and what its ramifications are for memory use on a node.
Let's say I start a Batchscanner with 20 threads, and scan across the entire range of rows in a table of 80 tablets, spread across 4 nodes. Will the Batchscanner try to spin off 20 threads if possible, or will it try to match it to the number of nodes? Should I try to match the number of threads with the number of cores that will be working on the data?
When a thread is spun off, my thinking is that the tablet that the thread is spun off on will move the entire tablet to memory, and then the tablet will be iterated through. Is this how it typically happens (or is there possibly multiple threads on the same tablet)? If so, do I have to worry about memory issues if, say, one of the nodes tries to move 10 tablets into memory, but doesn't have 20 GB of RAM left to store it?
Sorry for the vagueness of the questions, but I'm trying to understand how the general process works under the covers, in order to diagnose some performance issues I have been having.