Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - independent scans to same region processed serially


+
James Taylor 2013-02-09, 01:49
+
Marcos Ortiz 2013-02-09, 03:18
+
lars hofhansl 2013-02-09, 02:51
+
James Taylor 2013-02-09, 05:52
+
lars hofhansl 2013-02-09, 17:02
Copy link to this message
-
Re: independent scans to same region processed serially
James Taylor 2013-02-09, 17:28
Ok, thanks. Are you able to repro easily, or would you like me to put something together?

James

On Feb 9, 2013, at 9:02 AM, "lars hofhansl" <[EMAIL PROTECTED]> wrote:

> I looked through the code. Nothing obvious jumps out.
> We can sit together on Monday and run it through a profiler.
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: James Taylor <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]>
> Cc:
> Sent: Friday, February 8, 2013 9:52 PM
> Subject: Re: independent scans to same region processed serially
>
> All data is the blockcache and there are plenty of handlers. To repro,
> you could:
> - create a table pre-split into, for example, three regions
> - execute serially a scan on the middle region
> - execute two parallel scans each on half of the middle region
> - you'd expect the parallel scan to execute near twice as fast, but
> we're seeing it execute slower than the serial scan.
> We're using the same HConnection with different HTable instances for
> each scan.
>
>      James
>
> On 02/08/2013 06:51 PM, lars hofhansl wrote:
>> Is your data all in the blockcache, otherwise you might have run into HBASE-7336 (https://issues.apache.org/jira/browse/HBASE-7336).Fixed 0.94.4.
>> I assume you have enough handlers, etc. (i.e. does the same happen if issue multiple scan request across different region of the same region server?)
>>
>>
>> -- Lars
>>
>>
>>
>> ________________________________
>>    From: James Taylor <[EMAIL PROTECTED]>
>> To: HBase User <[EMAIL PROTECTED]>
>> Sent: Friday, February 8, 2013 5:49 PM
>> Subject: independent scans to same region processed serially
>>  
>> Wanted to check with folks and see if they've seen an issue around this before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple scans to different parts of the same region, they appear to be processed serially. It's actually faster from the client side to execute a single serial scan than it is to execute multiple parallel scans to different segments of the region. I do have region observer coprocessors for the table I'm scanning, but my code is not doing any synchronization.
>>
>> Is there a known limitation in this area? Anyone else see anything similar?
>>
>>       James
+
lars hofhansl 2013-02-09, 19:04
+
James Taylor 2013-02-10, 08:30
+
ramkrishna vasudevan 2013-02-09, 10:48
+
lars hofhansl 2013-02-09, 16:49