Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # user - number of query threads for batch scanner


+
ameet kini 2012-09-25, 18:22
+
ameet kini 2012-09-25, 18:45
+
William Slacum 2012-09-25, 19:08
+
ameet kini 2012-09-25, 19:17
+
ameet kini 2012-09-25, 19:23
+
ameet kini 2012-09-26, 13:19
+
Eric Newton 2012-09-28, 02:39
Copy link to this message
-
Re: number of query threads for batch scanner
Keith Turner 2012-09-28, 12:04
On Tue, Sep 25, 2012 at 3:17 PM, ameet kini <[EMAIL PROTECTED]> wrote:
> Thanks William.
>
> The issue here is that without knowing how the numQueryThreads translates to
> the number of concurrent scans, I cannot effectively tune that parameter to
> maximize resource usage on the tablet server. What I'm seeing is that even
> though there are four tablets on the tablet server, my number of concurrent
> scans never exceeds 3. This is despite setting numQueryThreads to a very
> high number and having 8 cores on the tablet server. I suspect with 3
> concurrent scans and no garbage collection happening at that moment, most of
> the cores are sitting idle.
>
> Ameet

The amount if parallelism is determined by how your ranges map to
tablets. Below are some examples.

 * For one range that maps to 10 tablets on 10 tablets severs, it will
execute 10 concurrent scans if numQueryThreads is >= 10.
 * For 1000 ranges that map to 10 tablets on 10 tablet servers, it
will execute 10 concurrent scans if numQueryThreads is >= 10.
 * For 1000 ranges that map to 10 tablets on 10 tablet servers, it
will execute 5 concurrent scans if numQueryThreads is 5.
 * For 1000 ranges that map to 1 tablet, it will execute 1 concurrent scan.

If you have more query threads than tablet server, the client code
will try to execute concurrent scans on a single tablet server.

You can look at TabletServerBatchReaderIterator.doLookups() for the
details.  In this method it creates QueryTask objects and places them
on a thread pool.  The size of the thread pool is the user specified
numQueryThreads.

>
> On Tue, Sep 25, 2012 at 3:08 PM, William Slacum
> <[EMAIL PROTECTED]> wrote:
>>
>> It should really be dependent upon the resources available to the client.
>> You can set an arbitrarily high number of threads, but you're still bound by
>> the number of parallel operations the CPU can make. I would assume the sweet
>> spot is somewhere around that number-- try doing a small bench mark with 2,
>> 4, 8, 16, etc threads and see where your performance starts to level off.
>>
>>
>> On Tue, Sep 25, 2012 at 11:45 AM, ameet kini <[EMAIL PROTECTED]> wrote:
>>>
>>> Probably worth adding that the table mentioned below has a bunch of
>>> tablets on other tablet servers as well, which is why I'm using
>>> BatchScanner. I'm just not sure how the numQueryThreads relates to the
>>> number of a concurrent scans on a given tablet server.
>>>
>>> Thanks
>>>
>>>
>>> On Tue, Sep 25, 2012 at 2:22 PM, ameet kini <[EMAIL PROTECTED]> wrote:
>>>>
>>>>
>>>> I have a table with 4 tablets on a given tablet server. Depending on the
>>>> numQueryThreads parameter below, I see a varying number of maximum
>>>> concurrent scans on that table. This maximum number varies from 1 to 3
>>>> (i.e., some values for numQueryThreads result in maximum concurrent scan of
>>>> 1, some values result in 2 concurrent scans, etc.). Can someone shed light
>>>> on what is the relationship between numQueryThreads and number of concurrent
>>>> scans?
>>>>
>>>> public BatchScanner createBatchScanner(String tableName,
>>>>                                        Authorizations authorizations,
>>>>                                        int numQueryThreads)
>>>>
>>>> A follow-on question would be what is general rule of thumb for setting
>>>> numQueryThreads? Should it be set to the  # of hosted tablets expected to be
>>>> consumed by that BatchScanner? Should it be the # of tablet servers expected
>>>> to be hit by that BatchScanner? Something else?
>>>>
>>>> Thanks,
>>>> Ameet
>>>>
>>>>
>>>
>>
>
+
ameet kini 2012-09-28, 13:35
+
Keith Turner 2012-09-28, 16:10