Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> number of query threads for batch scanner


+
ameet kini 2012-09-25, 18:22
+
ameet kini 2012-09-25, 18:45
+
William Slacum 2012-09-25, 19:08
+
ameet kini 2012-09-25, 19:17
+
ameet kini 2012-09-25, 19:23
+
ameet kini 2012-09-26, 13:19
+
Eric Newton 2012-09-28, 02:39
Copy link to this message
-
Re: number of query threads for batch scanner
On Tue, Sep 25, 2012 at 3:17 PM, ameet kini <[EMAIL PROTECTED]> wrote:
> Thanks William.
>
> The issue here is that without knowing how the numQueryThreads translates to
> the number of concurrent scans, I cannot effectively tune that parameter to
> maximize resource usage on the tablet server. What I'm seeing is that even
> though there are four tablets on the tablet server, my number of concurrent
> scans never exceeds 3. This is despite setting numQueryThreads to a very
> high number and having 8 cores on the tablet server. I suspect with 3
> concurrent scans and no garbage collection happening at that moment, most of
> the cores are sitting idle.
>
> Ameet

The amount if parallelism is determined by how your ranges map to
tablets. Below are some examples.

 * For one range that maps to 10 tablets on 10 tablets severs, it will
execute 10 concurrent scans if numQueryThreads is >= 10.
 * For 1000 ranges that map to 10 tablets on 10 tablet servers, it
will execute 10 concurrent scans if numQueryThreads is >= 10.
 * For 1000 ranges that map to 10 tablets on 10 tablet servers, it
will execute 5 concurrent scans if numQueryThreads is 5.
 * For 1000 ranges that map to 1 tablet, it will execute 1 concurrent scan.

If you have more query threads than tablet server, the client code
will try to execute concurrent scans on a single tablet server.

You can look at TabletServerBatchReaderIterator.doLookups() for the
details.  In this method it creates QueryTask objects and places them
on a thread pool.  The size of the thread pool is the user specified
numQueryThreads.

>
> On Tue, Sep 25, 2012 at 3:08 PM, William Slacum
> <[EMAIL PROTECTED]> wrote:
>>
>> It should really be dependent upon the resources available to the client.
>> You can set an arbitrarily high number of threads, but you're still bound by
>> the number of parallel operations the CPU can make. I would assume the sweet
>> spot is somewhere around that number-- try doing a small bench mark with 2,
>> 4, 8, 16, etc threads and see where your performance starts to level off.
>>
>>
>> On Tue, Sep 25, 2012 at 11:45 AM, ameet kini <[EMAIL PROTECTED]> wrote:
>>>
>>> Probably worth adding that the table mentioned below has a bunch of
>>> tablets on other tablet servers as well, which is why I'm using
>>> BatchScanner. I'm just not sure how the numQueryThreads relates to the
>>> number of a concurrent scans on a given tablet server.
>>>
>>> Thanks
>>>
>>>
>>> On Tue, Sep 25, 2012 at 2:22 PM, ameet kini <[EMAIL PROTECTED]> wrote:
>>>>
>>>>
>>>> I have a table with 4 tablets on a given tablet server. Depending on the
>>>> numQueryThreads parameter below, I see a varying number of maximum
>>>> concurrent scans on that table. This maximum number varies from 1 to 3
>>>> (i.e., some values for numQueryThreads result in maximum concurrent scan of
>>>> 1, some values result in 2 concurrent scans, etc.). Can someone shed light
>>>> on what is the relationship between numQueryThreads and number of concurrent
>>>> scans?
>>>>
>>>> public BatchScanner createBatchScanner(String tableName,
>>>>                                        Authorizations authorizations,
>>>>                                        int numQueryThreads)
>>>>
>>>> A follow-on question would be what is general rule of thumb for setting
>>>> numQueryThreads? Should it be set to the  # of hosted tablets expected to be
>>>> consumed by that BatchScanner? Should it be the # of tablet servers expected
>>>> to be hit by that BatchScanner? Something else?
>>>>
>>>> Thanks,
>>>> Ameet
>>>>
>>>>
>>>
>>
>
+
ameet kini 2012-09-28, 13:35
+
Keith Turner 2012-09-28, 16:10
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB