Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> independent scans to same region processed serially


Copy link to this message
-
Re: independent scans to same region processed serially
I looked through the code. Nothing obvious jumps out.
We can sit together on Monday and run it through a profiler.

-- Lars

----- Original Message -----
From: James Taylor <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]>
Cc:
Sent: Friday, February 8, 2013 9:52 PM
Subject: Re: independent scans to same region processed serially

All data is the blockcache and there are plenty of handlers. To repro,
you could:
- create a table pre-split into, for example, three regions
- execute serially a scan on the middle region
- execute two parallel scans each on half of the middle region
- you'd expect the parallel scan to execute near twice as fast, but
we're seeing it execute slower than the serial scan.
We're using the same HConnection with different HTable instances for
each scan.

     James

On 02/08/2013 06:51 PM, lars hofhansl wrote:
> Is your data all in the blockcache, otherwise you might have run into HBASE-7336 (https://issues.apache.org/jira/browse/HBASE-7336).Fixed 0.94.4.
> I assume you have enough handlers, etc. (i.e. does the same happen if issue multiple scan request across different region of the same region server?)
>
>
> -- Lars
>
>
>
> ________________________________
>   From: James Taylor <[EMAIL PROTECTED]>
> To: HBase User <[EMAIL PROTECTED]>
> Sent: Friday, February 8, 2013 5:49 PM
> Subject: independent scans to same region processed serially

> Wanted to check with folks and see if they've seen an issue around this before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple scans to different parts of the same region, they appear to be processed serially. It's actually faster from the client side to execute a single serial scan than it is to execute multiple parallel scans to different segments of the region. I do have region observer coprocessors for the table I'm scanning, but my code is not doing any synchronization.
>
> Is there a known limitation in this area? Anyone else see anything similar?
>
>      James
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB