Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Accumulo Utilities


Copy link to this message
-
Re: Accumulo Utilities
Thanks! I like the idea of sending my own thread pool to the batch scanner, that would definitely be the better solution.

Yeah I thought about creating a batch scanner with only one thread, but I was not sure if that is making a separate thread (outside of the current one) or using the current one. At the time I did not want a new thread to be created at all. Though, didn't realize the Scanner was also spinning up a thread at all, thought that was in process.

To mitigate the separate RPC call per range, would it make more sense to do a "binRanges" based on the ranges at the tablets to reduce the number of ranges?

On Mar 28, 2013, at 11:55 AM, Keith Turner <[EMAIL PROTECTED]> wrote:

> I took a quick look at the code. Excluding the threading issue, a
> major conceptual difference is that BatchScannerWithScanners seems to
> do a RPC round trip for each range.   The TabletServerBatchReader
> sends all of the ranges that a tablet server needs to lookup in one
> RPC.
>
> Instead of creating a BatchScannerWithScanners, maybe you could create
> a batch scanner with just one thread when resources are exceeded?
> This will be similar to what you are doing now, just one thread will
> be doing work fetching data.  The client thread would just be waiting
> on this background thread.   Although this does allow the processing
> of result to happen concurrently with fetching of data.  Using
> BatchScannerWithScanners would not allow this.
>
> Something to be aware of, the regular scanner will spin up a read
> ahead thread if you read a lot of data through it.  It does not do
> this immediately, only after fetching a few batches of key value pairs
> from the tablet server.  If this happens you could have one thread
> fetching data while the client thread processes results.
>
> Do you think we should open a a ticket about giving users control over
> threads created by client code?    Maybe users could pass in their own
> thread pool to a batch scanner?
>
>
> Keith
>
> On Thu, Mar 28, 2013 at 11:00 AM,  <[EMAIL PROTECTED]> wrote:
>> In some of my projects, we needed to control the number of threads spun up with the use of multiple batch scanners. We created a utility to control the number of threads, and if the max threads has been reached, return a batch scanner that is actually backed by Scanners. Wanted to get any feedback on the code. Seems like such a simple thing to do, I bet someone already has this. Thanks!
>>
>> https://github.com/calrissian/mango/tree/master/accumulo
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB