|
James Taylor
2013-02-09, 01:49
lars hofhansl
2013-02-09, 02:51
Marcos Ortiz
2013-02-09, 03:18
James Taylor
2013-02-09, 05:52
ramkrishna vasudevan
2013-02-09, 10:48
lars hofhansl
2013-02-09, 16:49
lars hofhansl
2013-02-09, 17:02
James Taylor
2013-02-09, 17:28
lars hofhansl
2013-02-09, 19:04
James Taylor
2013-02-10, 08:30
|
-
independent scans to same region processed seriallyJames Taylor 2013-02-09, 01:49
Wanted to check with folks and see if they've seen an issue around this
before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple scans to different parts of the same region, they appear to be processed serially. It's actually faster from the client side to execute a single serial scan than it is to execute multiple parallel scans to different segments of the region. I do have region observer coprocessors for the table I'm scanning, but my code is not doing any synchronization. Is there a known limitation in this area? Anyone else see anything similar? James
-
Re: independent scans to same region processed seriallylars hofhansl 2013-02-09, 02:51
Is your data all in the blockcache, otherwise you might have run into HBASE-7336 (https://issues.apache.org/jira/browse/HBASE-7336).Fixed 0.94.4.
I assume you have enough handlers, etc. (i.e. does the same happen if issue multiple scan request across different region of the same region server?) -- Lars ________________________________ From: James Taylor <[EMAIL PROTECTED]> To: HBase User <[EMAIL PROTECTED]> Sent: Friday, February 8, 2013 5:49 PM Subject: independent scans to same region processed serially Wanted to check with folks and see if they've seen an issue around this before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple scans to different parts of the same region, they appear to be processed serially. It's actually faster from the client side to execute a single serial scan than it is to execute multiple parallel scans to different segments of the region. I do have region observer coprocessors for the table I'm scanning, but my code is not doing any synchronization. Is there a known limitation in this area? Anyone else see anything similar? James
-
Re: independent scans to same region processed seriallyMarcos Ortiz 2013-02-09, 03:18
Regards, James,
Hari Kumar, from Ericsson Labs, in Data && Knowledge blog talked about these issues: http://labs.ericsson.com/blog/hbase-performance-tuners It would be nice to talk with him to convince him to share its knowledge here in the list, or in the next HBaseCon On 02/08/2013 08:49 PM, James Taylor wrote: > Wanted to check with folks and see if they've seen an issue around > this before digging in deeper. I'm on 0.94.2. If I execute in parallel > multiple scans to different parts of the same region, they appear to > be processed serially. It's actually faster from the client side to > execute a single serial scan than it is to execute multiple parallel > scans to different segments of the region. I do have region observer > coprocessors for the table I'm scanning, but my code is not doing any > synchronization. > > Is there a known limitation in this area? Anyone else see anything > similar? > > James -- Marcos Ortiz Valmaseda, Product Manager && Data Scientist at UCI Blog: http://marcosluis2186.posterous.com Twitter: @marcosluis2186 <http://twitter.com/marcosluis2186>
-
Re: independent scans to same region processed seriallyJames Taylor 2013-02-09, 05:52
All data is the blockcache and there are plenty of handlers. To repro,
you could: - create a table pre-split into, for example, three regions - execute serially a scan on the middle region - execute two parallel scans each on half of the middle region - you'd expect the parallel scan to execute near twice as fast, but we're seeing it execute slower than the serial scan. We're using the same HConnection with different HTable instances for each scan. James On 02/08/2013 06:51 PM, lars hofhansl wrote: > Is your data all in the blockcache, otherwise you might have run into HBASE-7336 (https://issues.apache.org/jira/browse/HBASE-7336).Fixed 0.94.4. > I assume you have enough handlers, etc. (i.e. does the same happen if issue multiple scan request across different region of the same region server?) > > > -- Lars > > > > ________________________________ > From: James Taylor <[EMAIL PROTECTED]> > To: HBase User <[EMAIL PROTECTED]> > Sent: Friday, February 8, 2013 5:49 PM > Subject: independent scans to same region processed serially > > Wanted to check with folks and see if they've seen an issue around this before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple scans to different parts of the same region, they appear to be processed serially. It's actually faster from the client side to execute a single serial scan than it is to execute multiple parallel scans to different segments of the region. I do have region observer coprocessors for the table I'm scanning, but my code is not doing any synchronization. > > Is there a known limitation in this area? Anyone else see anything similar? > > James
-
Re: independent scans to same region processed seriallyramkrishna vasudevan 2013-02-09, 10:48
What do you see in the thread dump? May be HBASE-7336 deals with scans
hitting the same block of data. But i see from your mail that the scans are independent of each other and they scan different data but in the same Region. Regards Ram On Sat, Feb 9, 2013 at 11:22 AM, James Taylor <[EMAIL PROTECTED]>wrote: > All data is the blockcache and there are plenty of handlers. To repro, you > could: > - create a table pre-split into, for example, three regions > - execute serially a scan on the middle region > - execute two parallel scans each on half of the middle region > - you'd expect the parallel scan to execute near twice as fast, but we're > seeing it execute slower than the serial scan. > We're using the same HConnection with different HTable instances for each > scan. > > James > > > On 02/08/2013 06:51 PM, lars hofhansl wrote: > >> Is your data all in the blockcache, otherwise you might have run into >> HBASE-7336 (https://issues.apache.org/**jira/browse/HBASE-7336).Fixed<https://issues.apache.org/jira/browse/HBASE-7336).Fixed>0.94.4. >> I assume you have enough handlers, etc. (i.e. does the same happen if >> issue multiple scan request across different region of the same region >> server?) >> >> >> -- Lars >> >> >> >> ______________________________**__ >> From: James Taylor <[EMAIL PROTECTED]> >> To: HBase User <[EMAIL PROTECTED]> >> Sent: Friday, February 8, 2013 5:49 PM >> Subject: independent scans to same region processed serially >> Wanted to check with folks and see if they've seen an issue around this >> before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple >> scans to different parts of the same region, they appear to be processed >> serially. It's actually faster from the client side to execute a single >> serial scan than it is to execute multiple parallel scans to different >> segments of the region. I do have region observer coprocessors for the >> table I'm scanning, but my code is not doing any synchronization. >> >> Is there a known limitation in this area? Anyone else see anything >> similar? >> >> James >> > >
-
Re: independent scans to same region processed seriallylars hofhansl 2013-02-09, 16:49
HBASE-7336 only deal with parallel read on the same HFile, since each HFile only has a single reader.
For scans you want to do seek+read (as opposed to positional reads), the problem with seek+read is that is that can only be done with the single thread. So HBASE-7336 just switches the read to a positional read if the reader is already locked. (somewhat of a hack) -- Lars ________________________________ From: ramkrishna vasudevan <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Saturday, February 9, 2013 2:48 AM Subject: Re: independent scans to same region processed serially What do you see in the thread dump? May be HBASE-7336 deals with scans hitting the same block of data. But i see from your mail that the scans are independent of each other and they scan different data but in the same Region. Regards Ram On Sat, Feb 9, 2013 at 11:22 AM, James Taylor <[EMAIL PROTECTED]>wrote: > All data is the blockcache and there are plenty of handlers. To repro, you > could: > - create a table pre-split into, for example, three regions > - execute serially a scan on the middle region > - execute two parallel scans each on half of the middle region > - you'd expect the parallel scan to execute near twice as fast, but we're > seeing it execute slower than the serial scan. > We're using the same HConnection with different HTable instances for each > scan. > > James > > > On 02/08/2013 06:51 PM, lars hofhansl wrote: > >> Is your data all in the blockcache, otherwise you might have run into >> HBASE-7336 (https://issues.apache.org/**jira/browse/HBASE-7336).Fixed<https://issues.apache.org/jira/browse/HBASE-7336).Fixed>0.94.4. >> I assume you have enough handlers, etc. (i.e. does the same happen if >> issue multiple scan request across different region of the same region >> server?) >> >> >> -- Lars >> >> >> >> ______________________________**__ >> From: James Taylor <[EMAIL PROTECTED]> >> To: HBase User <[EMAIL PROTECTED]> >> Sent: Friday, February 8, 2013 5:49 PM >> Subject: independent scans to same region processed serially >> Wanted to check with folks and see if they've seen an issue around this >> before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple >> scans to different parts of the same region, they appear to be processed >> serially. It's actually faster from the client side to execute a single >> serial scan than it is to execute multiple parallel scans to different >> segments of the region. I do have region observer coprocessors for the >> table I'm scanning, but my code is not doing any synchronization. >> >> Is there a known limitation in this area? Anyone else see anything >> similar? >> >> James >> > >
-
Re: independent scans to same region processed seriallylars hofhansl 2013-02-09, 17:02
I looked through the code. Nothing obvious jumps out.
We can sit together on Monday and run it through a profiler. -- Lars ----- Original Message ----- From: James Taylor <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]> Cc: Sent: Friday, February 8, 2013 9:52 PM Subject: Re: independent scans to same region processed serially All data is the blockcache and there are plenty of handlers. To repro, you could: - create a table pre-split into, for example, three regions - execute serially a scan on the middle region - execute two parallel scans each on half of the middle region - you'd expect the parallel scan to execute near twice as fast, but we're seeing it execute slower than the serial scan. We're using the same HConnection with different HTable instances for each scan. James On 02/08/2013 06:51 PM, lars hofhansl wrote: > Is your data all in the blockcache, otherwise you might have run into HBASE-7336 (https://issues.apache.org/jira/browse/HBASE-7336).Fixed 0.94.4. > I assume you have enough handlers, etc. (i.e. does the same happen if issue multiple scan request across different region of the same region server?) > > > -- Lars > > > > ________________________________ > From: James Taylor <[EMAIL PROTECTED]> > To: HBase User <[EMAIL PROTECTED]> > Sent: Friday, February 8, 2013 5:49 PM > Subject: independent scans to same region processed serially > > Wanted to check with folks and see if they've seen an issue around this before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple scans to different parts of the same region, they appear to be processed serially. It's actually faster from the client side to execute a single serial scan than it is to execute multiple parallel scans to different segments of the region. I do have region observer coprocessors for the table I'm scanning, but my code is not doing any synchronization. > > Is there a known limitation in this area? Anyone else see anything similar? > > James
-
Re: independent scans to same region processed seriallyJames Taylor 2013-02-09, 17:28
Ok, thanks. Are you able to repro easily, or would you like me to put something together?
James On Feb 9, 2013, at 9:02 AM, "lars hofhansl" <[EMAIL PROTECTED]> wrote: > I looked through the code. Nothing obvious jumps out. > We can sit together on Monday and run it through a profiler. > > -- Lars > > > > ----- Original Message ----- > From: James Taylor <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]> > Cc: > Sent: Friday, February 8, 2013 9:52 PM > Subject: Re: independent scans to same region processed serially > > All data is the blockcache and there are plenty of handlers. To repro, > you could: > - create a table pre-split into, for example, three regions > - execute serially a scan on the middle region > - execute two parallel scans each on half of the middle region > - you'd expect the parallel scan to execute near twice as fast, but > we're seeing it execute slower than the serial scan. > We're using the same HConnection with different HTable instances for > each scan. > > James > > On 02/08/2013 06:51 PM, lars hofhansl wrote: >> Is your data all in the blockcache, otherwise you might have run into HBASE-7336 (https://issues.apache.org/jira/browse/HBASE-7336).Fixed 0.94.4. >> I assume you have enough handlers, etc. (i.e. does the same happen if issue multiple scan request across different region of the same region server?) >> >> >> -- Lars >> >> >> >> ________________________________ >> From: James Taylor <[EMAIL PROTECTED]> >> To: HBase User <[EMAIL PROTECTED]> >> Sent: Friday, February 8, 2013 5:49 PM >> Subject: independent scans to same region processed serially >> >> Wanted to check with folks and see if they've seen an issue around this before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple scans to different parts of the same region, they appear to be processed serially. It's actually faster from the client side to execute a single serial scan than it is to execute multiple parallel scans to different segments of the region. I do have region observer coprocessors for the table I'm scanning, but my code is not doing any synchronization. >> >> Is there a known limitation in this area? Anyone else see anything similar? >> >> James
-
Re: independent scans to same region processed seriallylars hofhansl 2013-02-09, 19:04
If you had something that'd be great. Preferrable with a local/single region server.
(Maybe time to take this private :) ) -- Lars ----- Original Message ----- From: James Taylor <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]> Cc: Sent: Saturday, February 9, 2013 9:28 AM Subject: Re: independent scans to same region processed serially Ok, thanks. Are you able to repro easily, or would you like me to put something together? James On Feb 9, 2013, at 9:02 AM, "lars hofhansl" <[EMAIL PROTECTED]> wrote: > I looked through the code. Nothing obvious jumps out. > We can sit together on Monday and run it through a profiler. > > -- Lars > > > > ----- Original Message ----- > From: James Taylor <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]> > Cc: > Sent: Friday, February 8, 2013 9:52 PM > Subject: Re: independent scans to same region processed serially > > All data is the blockcache and there are plenty of handlers. To repro, > you could: > - create a table pre-split into, for example, three regions > - execute serially a scan on the middle region > - execute two parallel scans each on half of the middle region > - you'd expect the parallel scan to execute near twice as fast, but > we're seeing it execute slower than the serial scan. > We're using the same HConnection with different HTable instances for > each scan. > > James > > On 02/08/2013 06:51 PM, lars hofhansl wrote: >> Is your data all in the blockcache, otherwise you might have run into HBASE-7336 (https://issues.apache.org/jira/browse/HBASE-7336).Fixed 0.94.4. >> I assume you have enough handlers, etc. (i.e. does the same happen if issue multiple scan request across different region of the same region server?) >> >> >> -- Lars >> >> >> >> ________________________________ >> From: James Taylor <[EMAIL PROTECTED]> >> To: HBase User <[EMAIL PROTECTED]> >> Sent: Friday, February 8, 2013 5:49 PM >> Subject: independent scans to same region processed serially >> >> Wanted to check with folks and see if they've seen an issue around this before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple scans to different parts of the same region, they appear to be processed serially. It's actually faster from the client side to execute a single serial scan than it is to execute multiple parallel scans to different segments of the region. I do have region observer coprocessors for the table I'm scanning, but my code is not doing any synchronization. >> >> Is there a known limitation in this area? Anyone else see anything similar? >> >> James
-
Re: independent scans to same region processed seriallyJames Taylor 2013-02-10, 08:30
Filed https://issues.apache.org/jira/browse/HBASE-7805
Test case attached It occurs only if the table has a region observer coprocessor. James On 02/09/2013 11:04 AM, lars hofhansl wrote: > If I execute in parallel multiple scans to different parts of the same region, they appear to be processed serially. It's actually faster from the client side to execute a single serial scan than it is to execute multiple parallel scans to different segments of the region. I do have region observer coprocessors for the table I'm scanning, but my code is not doing any synchronization. > >> |