Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - number of query threads for batch scanner


Copy link to this message
-
Re: number of query threads for batch scanner
ameet kini 2012-09-25, 19:23
I should also state the not-so-obvious that my Range spans the entire range
of the four tablets in question.

Ameet

On Tue, Sep 25, 2012 at 3:17 PM, ameet kini <[EMAIL PROTECTED]> wrote:

> Thanks William.
>
> The issue here is that without knowing how the numQueryThreads translates
> to the number of concurrent scans, I cannot effectively tune that parameter
> to maximize resource usage on the tablet server. What I'm seeing is that
> even though there are four tablets on the tablet server, my number of
> concurrent scans never exceeds 3. This is despite setting numQueryThreads
> to a very high number and having 8 cores on the tablet server. I suspect
> with 3 concurrent scans and no garbage collection happening at that moment,
> most of the cores are sitting idle.
>
> Ameet
>
> On Tue, Sep 25, 2012 at 3:08 PM, William Slacum <
> [EMAIL PROTECTED]> wrote:
>
>> It should really be dependent upon the resources available to the client.
>> You can set an arbitrarily high number of threads, but you're still bound
>> by the number of parallel operations the CPU can make. I would assume the
>> sweet spot is somewhere around that number-- try doing a small bench mark
>> with 2, 4, 8, 16, etc threads and see where your performance starts to
>> level off.
>>
>>
>> On Tue, Sep 25, 2012 at 11:45 AM, ameet kini <[EMAIL PROTECTED]> wrote:
>>
>>> Probably worth adding that the table mentioned below has a bunch of
>>> tablets on other tablet servers as well, which is why I'm using
>>> BatchScanner. I'm just not sure how the numQueryThreads relates to the
>>> number of a concurrent scans on a given tablet server.
>>>
>>> Thanks
>>>
>>>
>>> On Tue, Sep 25, 2012 at 2:22 PM, ameet kini <[EMAIL PROTECTED]> wrote:
>>>
>>>>
>>>> I have a table with 4 tablets on a given tablet server. Depending on
>>>> the numQueryThreads parameter below, I see a varying number of maximum
>>>> concurrent scans on that table. This maximum number varies from 1 to 3
>>>> (i.e., some values for numQueryThreads result in maximum concurrent scan of
>>>> 1, some values result in 2 concurrent scans, etc.). Can someone shed light
>>>> on what is the relationship between numQueryThreads and number of
>>>> concurrent scans?
>>>>
>>>> public BatchScanner createBatchScanner(String tableName,
>>>>                                        Authorizations authorizations,
>>>>                                        int numQueryThreads)
>>>>
>>>> A follow-on question would be what is general rule of thumb for setting
>>>> numQueryThreads? Should it be set to the  # of hosted tablets expected to
>>>> be consumed by that BatchScanner? Should it be the # of tablet servers
>>>> expected to be hit by that BatchScanner? Something else?
>>>>
>>>> Thanks,
>>>> Ameet
>>>>
>>>>
>>>>
>>>
>>
>