Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Accumulo Utilities


+
roshanp@... 2013-03-28, 15:00
+
Keith Turner 2013-03-28, 15:55
+
roshanp@... 2013-03-28, 16:15
+
Keith Turner 2013-03-28, 17:15
Copy link to this message
-
Re: Accumulo Utilities
Yeah, that is why in the ThreadPoolConnector, I did not want to block ever. If the pool is exhausted, then just make a different kind of BatchScanner, that doesn't spawn new threads. Once the BatchScanner is closed, then release the threads. I can probably make a ThreadPool implementation that does that, just returns only 1 thread if the pool is exhausted and never block.

I did not want to spin up a new thread at all once the pool is exhausted, but from what you are saying it is ok to really have a new thread. Instead of increasing the threads used by 10+ with each batch scanner, I would just be increasing by 1, that isn't so bad.

For binning of ranges, would it make more sense to add a server side iterator to make sure the gaps do not come back. So it might go like this:

ranges = 1-2, 5-6, 7-8
Tablet servers Ranges: T1: 1-4, T2: 5-10

The ranges actually searched will be T1: 1-2, and T2: 5-8 (with a server side iterator removing the ranges not included)

What about the BatchScanner, doesn't it also binRanges, and then tell each tablet server that it only cares about a subset of ranges. That way you only have your number of ranges maxed at the number of tablet servers that have the ranges you asked for. Then each tablet server knows exactly which ranges to return?

Feel free to ignore the myriad of questions, it is interesting learning the inner workings of the BatchScanner and Scanner.

Roshan

On Mar 28, 2013, at 1:15 PM, Keith Turner <[EMAIL PROTECTED]> wrote:

> On Thu, Mar 28, 2013 at 12:15 PM,  <[EMAIL PROTECTED]> wrote:
>> Thanks! I like the idea of sending my own thread pool to the batch scanner, that would definitely be the better solution.
>
> Would you like to open a ticket about this issue?
>
> I just remembered, there is an issues w/ this approach to be aware of
> .  I have seen this when multiple threads share a batch scanner (more
> in this below).  Consider the following situation.
>
> 1. Thread A gives a lot of work to BatchScanner1 using Threadpool1,
> creating BatchScannerIterator1
> 2. BatchScannerIterator1's internal queue fills up as result of work
> given by Thread A
> 3. All threads in ThreadPool1 block trying to add to
> BatchScannerIterator1 queue
> 4. Thread B gives a lot of work to BatchScanner2 using Threadpool1,
> creating BatchScannerIterator2
> 5. Thread B attempts to iterate over BatchScannerIterator2, but
> blocks forever because no threads service it
>
> This problem occurs because Thread A never reads from BatchScannerIterator1
>
> In the current code, multiple threads can use a BatchScanner.  You
> just need to make configuring the BatchScanner and getting an iterator
> an atomic operation.   When an iterator is created by a batch scanner,
> it copies the config that exist at that point in time.  Changes to the
> BatchScanner config after an iterator is created, will not affect the
> iterator.
>
>
>
>>
>> Yeah I thought about creating a batch scanner with only one thread, but I was not sure if that is making a separate thread (outside of the current one) or using the current one. At the time I did not want a new thread to be created at all. Though, didn't realize the Scanner was also spinning up a thread at all, thought that was in process.
>
> The batch scanner will create a new thread pool w/ one thread.
>
>>
>> To mitigate the separate RPC call per range, would it make more sense to do a "binRanges" based on the ranges at the tablets to reduce the number of ranges?
>
> Probably do not want to combine ranges, that could bring back data in
> the gaps between ranges.
>
>>
>> On Mar 28, 2013, at 11:55 AM, Keith Turner <[EMAIL PROTECTED]> wrote:
>>
>>> I took a quick look at the code. Excluding the threading issue, a
>>> major conceptual difference is that BatchScannerWithScanners seems to
>>> do a RPC round trip for each range.   The TabletServerBatchReader
>>> sends all of the ranges that a tablet server needs to lookup in one
>>> RPC.
>>>
>>> Instead of creating a BatchScannerWithScanners, maybe you could create
+
Keith Turner 2013-03-28, 18:32
+
roshanp@... 2013-03-28, 18:46