Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> number of query threads for batch scanner

Copy link to this message
Re: number of query threads for batch scanner
I should also state the not-so-obvious that my Range spans the entire range
of the four tablets in question.


On Tue, Sep 25, 2012 at 3:17 PM, ameet kini <[EMAIL PROTECTED]> wrote:

> Thanks William.
> The issue here is that without knowing how the numQueryThreads translates
> to the number of concurrent scans, I cannot effectively tune that parameter
> to maximize resource usage on the tablet server. What I'm seeing is that
> even though there are four tablets on the tablet server, my number of
> concurrent scans never exceeds 3. This is despite setting numQueryThreads
> to a very high number and having 8 cores on the tablet server. I suspect
> with 3 concurrent scans and no garbage collection happening at that moment,
> most of the cores are sitting idle.
> Ameet
> On Tue, Sep 25, 2012 at 3:08 PM, William Slacum <
>> It should really be dependent upon the resources available to the client.
>> You can set an arbitrarily high number of threads, but you're still bound
>> by the number of parallel operations the CPU can make. I would assume the
>> sweet spot is somewhere around that number-- try doing a small bench mark
>> with 2, 4, 8, 16, etc threads and see where your performance starts to
>> level off.
>> On Tue, Sep 25, 2012 at 11:45 AM, ameet kini <[EMAIL PROTECTED]> wrote:
>>> Probably worth adding that the table mentioned below has a bunch of
>>> tablets on other tablet servers as well, which is why I'm using
>>> BatchScanner. I'm just not sure how the numQueryThreads relates to the
>>> number of a concurrent scans on a given tablet server.
>>> Thanks
>>> On Tue, Sep 25, 2012 at 2:22 PM, ameet kini <[EMAIL PROTECTED]> wrote:
>>>> I have a table with 4 tablets on a given tablet server. Depending on
>>>> the numQueryThreads parameter below, I see a varying number of maximum
>>>> concurrent scans on that table. This maximum number varies from 1 to 3
>>>> (i.e., some values for numQueryThreads result in maximum concurrent scan of
>>>> 1, some values result in 2 concurrent scans, etc.). Can someone shed light
>>>> on what is the relationship between numQueryThreads and number of
>>>> concurrent scans?
>>>> public BatchScanner createBatchScanner(String tableName,
>>>>                                        Authorizations authorizations,
>>>>                                        int numQueryThreads)
>>>> A follow-on question would be what is general rule of thumb for setting
>>>> numQueryThreads? Should it be set to the  # of hosted tablets expected to
>>>> be consumed by that BatchScanner? Should it be the # of tablet servers
>>>> expected to be hit by that BatchScanner? Something else?
>>>> Thanks,
>>>> Ameet