|
Jean-Daniel Cryans
2011-12-15, 01:26
yuzhihong@...
2011-12-15, 01:36
Jean-Daniel Cryans
2011-12-15, 01:45
Matt Corgan
2011-12-15, 01:51
Jean-Daniel Cryans
2011-12-15, 03:20
Vladimir Rodionov
2011-12-15, 20:11
Jean-Daniel Cryans
2011-12-15, 20:18
Jean-Daniel Cryans
2011-12-20, 00:43
Lars
2011-12-15, 16:44
Lars
2011-12-15, 19:35
Matt Corgan
2011-12-15, 20:17
Jean-Daniel Cryans
2011-12-15, 20:23
Jean-Daniel Cryans
2011-12-15, 20:24
Matt Corgan
2011-12-15, 20:29
Jean-Daniel Cryans
2011-12-15, 20:53
|
-
Early comparisons between 0.90 and 0.92Jean-Daniel Cryans 2011-12-15, 01:26
Hey guys,
I was doing some comparisons between 0.90.5 and 0.92.0, mainly regarding reads. The numbers are kinda irrelevant but the differences are. BTW this is on CDH3u3 with random reads. In 0.90.0, scanning 50M rows that are in the OS cache I go up to about 1.7M rows scanned per second. In 0.92.0, scanning those same rows (meaning that I didn't run compactions after migrating so it's picking the same data from the OS cache), I scan about 1.1 rows per second. 0.92 is 50% slower when scanning. In 0.90.0 random reading 50M rows that are OS cached I can do about 200k reads per second. In 0.92.0, again with those same rows, I can go up to 260k per second. 0.92 is 30% faster when random reading. I've been playing with that data set for a while and the numbers in 0.92.0 when using HFileV1 or V2 are pretty much the same meaning that something else changed or the code that's generic to both did. I'd like to be able to associate those differences to code changes in order to understand what's going on. I would really appreciate if others also took some time to test it out or to think about what could cause this. Thx, J-D +
Jean-Daniel Cryans 2011-12-15, 01:26
-
Re: Early comparisons between 0.90 and 0.92yuzhihong@... 2011-12-15, 01:36
Thanks for the info, J-D.
I guess the 1.1 below is in millions. Can you tell us more about your tables - bloom filters, etc ? 在 Dec 14, 2011,5:26 PM,Jean-Daniel Cryans <[EMAIL PROTECTED]> 写道: > Hey guys, > > I was doing some comparisons between 0.90.5 and 0.92.0, mainly > regarding reads. The numbers are kinda irrelevant but the differences > are. BTW this is on CDH3u3 with random reads. > > In 0.90.0, scanning 50M rows that are in the OS cache I go up to about > 1.7M rows scanned per second. > > In 0.92.0, scanning those same rows (meaning that I didn't run > compactions after migrating so it's picking the same data from the OS > cache), I scan about 1.1 rows per second. > > 0.92 is 50% slower when scanning. > > In 0.90.0 random reading 50M rows that are OS cached I can do about > 200k reads per second. > > In 0.92.0, again with those same rows, I can go up to 260k per second. > > 0.92 is 30% faster when random reading. > > I've been playing with that data set for a while and the numbers in > 0.92.0 when using HFileV1 or V2 are pretty much the same meaning that > something else changed or the code that's generic to both did. > > > I'd like to be able to associate those differences to code changes in > order to understand what's going on. I would really appreciate if > others also took some time to test it out or to think about what could > cause this. > > Thx, > > J-D +
yuzhihong@... 2011-12-15, 01:36
-
Re: Early comparisons between 0.90 and 0.92Jean-Daniel Cryans 2011-12-15, 01:45
Yes sorry 1.1M
This is PE, the table is set to a block size of 4KB and block caching is disabled. Nothing else special in there. J-D 2011/12/14 <[EMAIL PROTECTED]>: > Thanks for the info, J-D. > > I guess the 1.1 below is in millions. > > Can you tell us more about your tables - bloom filters, etc ? > > > > 在 Dec 14, 2011,5:26 PM,Jean-Daniel Cryans <[EMAIL PROTECTED]> 写道: > >> Hey guys, >> >> I was doing some comparisons between 0.90.5 and 0.92.0, mainly >> regarding reads. The numbers are kinda irrelevant but the differences >> are. BTW this is on CDH3u3 with random reads. >> >> In 0.90.0, scanning 50M rows that are in the OS cache I go up to about >> 1.7M rows scanned per second. >> >> In 0.92.0, scanning those same rows (meaning that I didn't run >> compactions after migrating so it's picking the same data from the OS >> cache), I scan about 1.1 rows per second. >> >> 0.92 is 50% slower when scanning. >> >> In 0.90.0 random reading 50M rows that are OS cached I can do about >> 200k reads per second. >> >> In 0.92.0, again with those same rows, I can go up to 260k per second. >> >> 0.92 is 30% faster when random reading. >> >> I've been playing with that data set for a while and the numbers in >> 0.92.0 when using HFileV1 or V2 are pretty much the same meaning that >> something else changed or the code that's generic to both did. >> >> >> I'd like to be able to associate those differences to code changes in >> order to understand what's going on. I would really appreciate if >> others also took some time to test it out or to think about what could >> cause this. >> >> Thx, >> >> J-D +
Jean-Daniel Cryans 2011-12-15, 01:45
-
Re: Early comparisons between 0.90 and 0.92Matt Corgan 2011-12-15, 01:51
Regions are major compacted and have empty memstores, so no merging of
stores when reading? 2011/12/14 Jean-Daniel Cryans <[EMAIL PROTECTED]> > Yes sorry 1.1M > > This is PE, the table is set to a block size of 4KB and block caching > is disabled. Nothing else special in there. > > J-D > > 2011/12/14 <[EMAIL PROTECTED]>: > > Thanks for the info, J-D. > > > > I guess the 1.1 below is in millions. > > > > Can you tell us more about your tables - bloom filters, etc ? > > > > > > > > 在 Dec 14, 2011,5:26 PM,Jean-Daniel Cryans <[EMAIL PROTECTED]> 写道: > > > >> Hey guys, > >> > >> I was doing some comparisons between 0.90.5 and 0.92.0, mainly > >> regarding reads. The numbers are kinda irrelevant but the differences > >> are. BTW this is on CDH3u3 with random reads. > >> > >> In 0.90.0, scanning 50M rows that are in the OS cache I go up to about > >> 1.7M rows scanned per second. > >> > >> In 0.92.0, scanning those same rows (meaning that I didn't run > >> compactions after migrating so it's picking the same data from the OS > >> cache), I scan about 1.1 rows per second. > >> > >> 0.92 is 50% slower when scanning. > >> > >> In 0.90.0 random reading 50M rows that are OS cached I can do about > >> 200k reads per second. > >> > >> In 0.92.0, again with those same rows, I can go up to 260k per second. > >> > >> 0.92 is 30% faster when random reading. > >> > >> I've been playing with that data set for a while and the numbers in > >> 0.92.0 when using HFileV1 or V2 are pretty much the same meaning that > >> something else changed or the code that's generic to both did. > >> > >> > >> I'd like to be able to associate those differences to code changes in > >> order to understand what's going on. I would really appreciate if > >> others also took some time to test it out or to think about what could > >> cause this. > >> > >> Thx, > >> > >> J-D > +
Matt Corgan 2011-12-15, 01:51
-
Re: Early comparisons between 0.90 and 0.92Jean-Daniel Cryans 2011-12-15, 03:20
Yes and yes.
J-D On Dec 14, 2011 5:52 PM, "Matt Corgan" <[EMAIL PROTECTED]> wrote: > Regions are major compacted and have empty memstores, so no merging of > stores when reading? > > > 2011/12/14 Jean-Daniel Cryans <[EMAIL PROTECTED]> > > > Yes sorry 1.1M > > > > This is PE, the table is set to a block size of 4KB and block caching > > is disabled. Nothing else special in there. > > > > J-D > > > > 2011/12/14 <[EMAIL PROTECTED]>: > > > Thanks for the info, J-D. > > > > > > I guess the 1.1 below is in millions. > > > > > > Can you tell us more about your tables - bloom filters, etc ? > > > > > > > > > > > > 在 Dec 14, 2011,5:26 PM,Jean-Daniel Cryans <[EMAIL PROTECTED]> 写道: > > > > > >> Hey guys, > > >> > > >> I was doing some comparisons between 0.90.5 and 0.92.0, mainly > > >> regarding reads. The numbers are kinda irrelevant but the differences > > >> are. BTW this is on CDH3u3 with random reads. > > >> > > >> In 0.90.0, scanning 50M rows that are in the OS cache I go up to about > > >> 1.7M rows scanned per second. > > >> > > >> In 0.92.0, scanning those same rows (meaning that I didn't run > > >> compactions after migrating so it's picking the same data from the OS > > >> cache), I scan about 1.1 rows per second. > > >> > > >> 0.92 is 50% slower when scanning. > > >> > > >> In 0.90.0 random reading 50M rows that are OS cached I can do about > > >> 200k reads per second. > > >> > > >> In 0.92.0, again with those same rows, I can go up to 260k per second. > > >> > > >> 0.92 is 30% faster when random reading. > > >> > > >> I've been playing with that data set for a while and the numbers in > > >> 0.92.0 when using HFileV1 or V2 are pretty much the same meaning that > > >> something else changed or the code that's generic to both did. > > >> > > >> > > >> I'd like to be able to associate those differences to code changes in > > >> order to understand what's going on. I would really appreciate if > > >> others also took some time to test it out or to think about what could > > >> cause this. > > >> > > >> Thx, > > >> > > >> J-D > > > +
Jean-Daniel Cryans 2011-12-15, 03:20
-
RE: Early comparisons between 0.90 and 0.92Vladimir Rodionov 2011-12-15, 20:11
200K random reads is way -way above of what we see in production.
? 1.1M row scan - as well. 10-20K per sec max when you run 'count' from HBase shell Is there any magic recipe I am not aware about yet? Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: [EMAIL PROTECTED] ________________________________________ From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] On Behalf Of Jean-Daniel Cryans [[EMAIL PROTECTED]] Sent: Wednesday, December 14, 2011 7:20 PM To: [EMAIL PROTECTED] Subject: Re: Early comparisons between 0.90 and 0.92 Yes and yes. J-D On Dec 14, 2011 5:52 PM, "Matt Corgan" <[EMAIL PROTECTED]> wrote: > Regions are major compacted and have empty memstores, so no merging of > stores when reading? > > > 2011/12/14 Jean-Daniel Cryans <[EMAIL PROTECTED]> > > > Yes sorry 1.1M > > > > This is PE, the table is set to a block size of 4KB and block caching > > is disabled. Nothing else special in there. > > > > J-D > > > > 2011/12/14 <[EMAIL PROTECTED]>: > > > Thanks for the info, J-D. > > > > > > I guess the 1.1 below is in millions. > > > > > > Can you tell us more about your tables - bloom filters, etc ? > > > > > > > > > > > > 在 Dec 14, 2011,5:26 PM,Jean-Daniel Cryans <[EMAIL PROTECTED]> 写道: > > > > > >> Hey guys, > > >> > > >> I was doing some comparisons between 0.90.5 and 0.92.0, mainly > > >> regarding reads. The numbers are kinda irrelevant but the differences > > >> are. BTW this is on CDH3u3 with random reads. > > >> > > >> In 0.90.0, scanning 50M rows that are in the OS cache I go up to about > > >> 1.7M rows scanned per second. > > >> > > >> In 0.92.0, scanning those same rows (meaning that I didn't run > > >> compactions after migrating so it's picking the same data from the OS > > >> cache), I scan about 1.1 rows per second. > > >> > > >> 0.92 is 50% slower when scanning. > > >> > > >> In 0.90.0 random reading 50M rows that are OS cached I can do about > > >> 200k reads per second. > > >> > > >> In 0.92.0, again with those same rows, I can go up to 260k per second. > > >> > > >> 0.92 is 30% faster when random reading. > > >> > > >> I've been playing with that data set for a while and the numbers in > > >> 0.92.0 when using HFileV1 or V2 are pretty much the same meaning that > > >> something else changed or the code that's generic to both did. > > >> > > >> > > >> I'd like to be able to associate those differences to code changes in > > >> order to understand what's going on. I would really appreciate if > > >> others also took some time to test it out or to think about what could > > >> cause this. > > >> > > >> Thx, > > >> > > >> J-D > > > Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments. +
Vladimir Rodionov 2011-12-15, 20:11
-
Re: Early comparisons between 0.90 and 0.92Jean-Daniel Cryans 2011-12-15, 20:18
Like I said, I'm using local reads (HDFS-2246) and the data is
_already in the OS cache_. J-D 2011/12/15 Vladimir Rodionov <[EMAIL PROTECTED]>: > 200K random reads is way -way above of what we see in production. > ? > > 1.1M row scan - as well. 10-20K per sec max when you run 'count' from HBase shell > > Is there any magic recipe I am not aware about yet? > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: [EMAIL PROTECTED] > > ________________________________________ > From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] On Behalf Of Jean-Daniel Cryans [[EMAIL PROTECTED]] > Sent: Wednesday, December 14, 2011 7:20 PM > To: [EMAIL PROTECTED] > Subject: Re: Early comparisons between 0.90 and 0.92 > > Yes and yes. > > J-D > On Dec 14, 2011 5:52 PM, "Matt Corgan" <[EMAIL PROTECTED]> wrote: > >> Regions are major compacted and have empty memstores, so no merging of >> stores when reading? >> >> >> 2011/12/14 Jean-Daniel Cryans <[EMAIL PROTECTED]> >> >> > Yes sorry 1.1M >> > >> > This is PE, the table is set to a block size of 4KB and block caching >> > is disabled. Nothing else special in there. >> > >> > J-D >> > >> > 2011/12/14 <[EMAIL PROTECTED]>: >> > > Thanks for the info, J-D. >> > > >> > > I guess the 1.1 below is in millions. >> > > >> > > Can you tell us more about your tables - bloom filters, etc ? >> > > >> > > >> > > >> > > 在 Dec 14, 2011,5:26 PM,Jean-Daniel Cryans <[EMAIL PROTECTED]> 写道: >> > > >> > >> Hey guys, >> > >> >> > >> I was doing some comparisons between 0.90.5 and 0.92.0, mainly >> > >> regarding reads. The numbers are kinda irrelevant but the differences >> > >> are. BTW this is on CDH3u3 with random reads. >> > >> >> > >> In 0.90.0, scanning 50M rows that are in the OS cache I go up to about >> > >> 1.7M rows scanned per second. >> > >> >> > >> In 0.92.0, scanning those same rows (meaning that I didn't run >> > >> compactions after migrating so it's picking the same data from the OS >> > >> cache), I scan about 1.1 rows per second. >> > >> >> > >> 0.92 is 50% slower when scanning. >> > >> >> > >> In 0.90.0 random reading 50M rows that are OS cached I can do about >> > >> 200k reads per second. >> > >> >> > >> In 0.92.0, again with those same rows, I can go up to 260k per second. >> > >> >> > >> 0.92 is 30% faster when random reading. >> > >> >> > >> I've been playing with that data set for a while and the numbers in >> > >> 0.92.0 when using HFileV1 or V2 are pretty much the same meaning that >> > >> something else changed or the code that's generic to both did. >> > >> >> > >> >> > >> I'd like to be able to associate those differences to code changes in >> > >> order to understand what's going on. I would really appreciate if >> > >> others also took some time to test it out or to think about what could >> > >> cause this. >> > >> >> > >> Thx, >> > >> >> > >> J-D >> > >> > > Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [EMAIL PROTECTED] and delete or destroy any copy of this message and its attachments. +
Jean-Daniel Cryans 2011-12-15, 20:18
-
Re: Early comparisons between 0.90 and 0.92Jean-Daniel Cryans 2011-12-20, 00:43
I redid the tests with a 64KB block size and went all the way to
testing HFileV2 (which was badly missing). Scans: 0.90: Bursts up to 1.7M 0.92 V1: Bursts up to 1.4M 0.92 V2: Bursts up to 1.8M Random reads: 0.90: Steady at 165k 0.92 V1: Steady at 195k 0.92 V2: Steady at 245k Basically, as long as people migrate to HFileV2 there won't be any problems. I attached a profiler to see what was going on with scans and it seems that 0.92 V1 uses a different path than both 0.90 and 0.92 V2. After testing V2, I think it's not necessary to dig further into the issue although it seems to be related to the usage of BufferedInputStream in the faster paths. Sorry for the original scare, hopefully this is useful for someone. J-D On Wed, Dec 14, 2011 at 5:26 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: > Hey guys, > > I was doing some comparisons between 0.90.5 and 0.92.0, mainly > regarding reads. The numbers are kinda irrelevant but the differences > are. BTW this is on CDH3u3 with random reads. > > In 0.90.0, scanning 50M rows that are in the OS cache I go up to about > 1.7M rows scanned per second. > > In 0.92.0, scanning those same rows (meaning that I didn't run > compactions after migrating so it's picking the same data from the OS > cache), I scan about 1.1 rows per second. > > 0.92 is 50% slower when scanning. > > In 0.90.0 random reading 50M rows that are OS cached I can do about > 200k reads per second. > > In 0.92.0, again with those same rows, I can go up to 260k per second. > > 0.92 is 30% faster when random reading. > > I've been playing with that data set for a while and the numbers in > 0.92.0 when using HFileV1 or V2 are pretty much the same meaning that > something else changed or the code that's generic to both did. > > > I'd like to be able to associate those differences to code changes in > order to understand what's going on. I would really appreciate if > others also took some time to test it out or to think about what could > cause this. > > Thx, > > J-D +
Jean-Daniel Cryans 2011-12-20, 00:43
-
Re: Early comparisons between 0.90 and 0.92Lars 2011-12-15, 16:44
I'll be busy today... I'll double check my scanning related changes as soon as i can.
Jean-Daniel Cryans <[EMAIL PROTECTED]> schrieb: >Yes and yes. > >J-D >On Dec 14, 2011 5:52 PM, "Matt Corgan" <[EMAIL PROTECTED]> wrote: > >> Regions are major compacted and have empty memstores, so no merging of >> stores when reading? >> >> >> 2011/12/14 Jean-Daniel Cryans <[EMAIL PROTECTED]> >> >> > Yes sorry 1.1M >> > >> > This is PE, the table is set to a block size of 4KB and block caching >> > is disabled. Nothing else special in there. >> > >> > J-D >> > >> > 2011/12/14 <[EMAIL PROTECTED]>: >> > > Thanks for the info, J-D. >> > > >> > > I guess the 1.1 below is in millions. >> > > >> > > Can you tell us more about your tables - bloom filters, etc ? >> > > >> > > >> > > >> > > 在 Dec 14, 2011,5:26 PM,Jean-Daniel Cryans <[EMAIL PROTECTED]> 写道: >> > > >> > >> Hey guys, >> > >> >> > >> I was doing some comparisons between 0.90.5 and 0.92.0, mainly >> > >> regarding reads. The numbers are kinda irrelevant but the differences >> > >> are. BTW this is on CDH3u3 with random reads. >> > >> >> > >> In 0.90.0, scanning 50M rows that are in the OS cache I go up to about >> > >> 1.7M rows scanned per second. >> > >> >> > >> In 0.92.0, scanning those same rows (meaning that I didn't run >> > >> compactions after migrating so it's picking the same data from the OS >> > >> cache), I scan about 1.1 rows per second. >> > >> >> > >> 0.92 is 50% slower when scanning. >> > >> >> > >> In 0.90.0 random reading 50M rows that are OS cached I can do about >> > >> 200k reads per second. >> > >> >> > >> In 0.92.0, again with those same rows, I can go up to 260k per second. >> > >> >> > >> 0.92 is 30% faster when random reading. >> > >> >> > >> I've been playing with that data set for a while and the numbers in >> > >> 0.92.0 when using HFileV1 or V2 are pretty much the same meaning that >> > >> something else changed or the code that's generic to both did. >> > >> >> > >> >> > >> I'd like to be able to associate those differences to code changes in >> > >> order to understand what's going on. I would really appreciate if >> > >> others also took some time to test it out or to think about what could >> > >> cause this. >> > >> >> > >> Thx, >> > >> >> > >> J-D >> > >> +
Lars 2011-12-15, 16:44
-
Re: Early comparisons between 0.90 and 0.92Lars 2011-12-15, 19:35
Do you see the same slowdown with the default 64k block size?
Lars <[EMAIL PROTECTED]> schrieb: >I'll be busy today... I'll double check my scanning related changes as soon as i can. > >Jean-Daniel Cryans <[EMAIL PROTECTED]> schrieb: > >>Yes and yes. >> >>J-D >>On Dec 14, 2011 5:52 PM, "Matt Corgan" <[EMAIL PROTECTED]> wrote: >> >>> Regions are major compacted and have empty memstores, so no merging of >>> stores when reading? >>> >>> >>> 2011/12/14 Jean-Daniel Cryans <[EMAIL PROTECTED]> >>> >>> > Yes sorry 1.1M >>> > >>> > This is PE, the table is set to a block size of 4KB and block caching >>> > is disabled. Nothing else special in there. >>> > >>> > J-D >>> > >>> > 2011/12/14 <[EMAIL PROTECTED]>: >>> > > Thanks for the info, J-D. >>> > > >>> > > I guess the 1.1 below is in millions. >>> > > >>> > > Can you tell us more about your tables - bloom filters, etc ? >>> > > >>> > > >>> > > >>> > > 在 Dec 14, 2011,5:26 PM,Jean-Daniel Cryans <[EMAIL PROTECTED]> 写道: >>> > > >>> > >> Hey guys, >>> > >> >>> > >> I was doing some comparisons between 0.90.5 and 0.92.0, mainly >>> > >> regarding reads. The numbers are kinda irrelevant but the differences >>> > >> are. BTW this is on CDH3u3 with random reads. >>> > >> >>> > >> In 0.90.0, scanning 50M rows that are in the OS cache I go up to about >>> > >> 1.7M rows scanned per second. >>> > >> >>> > >> In 0.92.0, scanning those same rows (meaning that I didn't run >>> > >> compactions after migrating so it's picking the same data from the OS >>> > >> cache), I scan about 1.1 rows per second. >>> > >> >>> > >> 0.92 is 50% slower when scanning. >>> > >> >>> > >> In 0.90.0 random reading 50M rows that are OS cached I can do about >>> > >> 200k reads per second. >>> > >> >>> > >> In 0.92.0, again with those same rows, I can go up to 260k per second. >>> > >> >>> > >> 0.92 is 30% faster when random reading. >>> > >> >>> > >> I've been playing with that data set for a while and the numbers in >>> > >> 0.92.0 when using HFileV1 or V2 are pretty much the same meaning that >>> > >> something else changed or the code that's generic to both did. >>> > >> >>> > >> >>> > >> I'd like to be able to associate those differences to code changes in >>> > >> order to understand what's going on. I would really appreciate if >>> > >> others also took some time to test it out or to think about what could >>> > >> cause this. >>> > >> >>> > >> Thx, >>> > >> >>> > >> J-D >>> > >>> +
Lars 2011-12-15, 19:35
-
Re: Early comparisons between 0.90 and 0.92Matt Corgan 2011-12-15, 20:17
260k random reads per second is a lot... is that on one node? how many
client threads? and is the client going over the network, is it on the datanode, or are you using a specialized test where they're in the same process? On Thu, Dec 15, 2011 at 11:35 AM, Lars <[EMAIL PROTECTED]> wrote: > Do you see the same slowdown with the default 64k block size? > > Lars <[EMAIL PROTECTED]> schrieb: > > >I'll be busy today... I'll double check my scanning related changes as > soon as i can. > > > >Jean-Daniel Cryans <[EMAIL PROTECTED]> schrieb: > > > >>Yes and yes. > >> > >>J-D > >>On Dec 14, 2011 5:52 PM, "Matt Corgan" <[EMAIL PROTECTED]> wrote: > >> > >>> Regions are major compacted and have empty memstores, so no merging of > >>> stores when reading? > >>> > >>> > >>> 2011/12/14 Jean-Daniel Cryans <[EMAIL PROTECTED]> > >>> > >>> > Yes sorry 1.1M > >>> > > >>> > This is PE, the table is set to a block size of 4KB and block caching > >>> > is disabled. Nothing else special in there. > >>> > > >>> > J-D > >>> > > >>> > 2011/12/14 <[EMAIL PROTECTED]>: > >>> > > Thanks for the info, J-D. > >>> > > > >>> > > I guess the 1.1 below is in millions. > >>> > > > >>> > > Can you tell us more about your tables - bloom filters, etc ? > >>> > > > >>> > > > >>> > > > >>> > > 在 Dec 14, 2011,5:26 PM,Jean-Daniel Cryans <[EMAIL PROTECTED]> > 写道: > >>> > > > >>> > >> Hey guys, > >>> > >> > >>> > >> I was doing some comparisons between 0.90.5 and 0.92.0, mainly > >>> > >> regarding reads. The numbers are kinda irrelevant but the > differences > >>> > >> are. BTW this is on CDH3u3 with random reads. > >>> > >> > >>> > >> In 0.90.0, scanning 50M rows that are in the OS cache I go up to > about > >>> > >> 1.7M rows scanned per second. > >>> > >> > >>> > >> In 0.92.0, scanning those same rows (meaning that I didn't run > >>> > >> compactions after migrating so it's picking the same data from > the OS > >>> > >> cache), I scan about 1.1 rows per second. > >>> > >> > >>> > >> 0.92 is 50% slower when scanning. > >>> > >> > >>> > >> In 0.90.0 random reading 50M rows that are OS cached I can do > about > >>> > >> 200k reads per second. > >>> > >> > >>> > >> In 0.92.0, again with those same rows, I can go up to 260k per > second. > >>> > >> > >>> > >> 0.92 is 30% faster when random reading. > >>> > >> > >>> > >> I've been playing with that data set for a while and the numbers > in > >>> > >> 0.92.0 when using HFileV1 or V2 are pretty much the same meaning > that > >>> > >> something else changed or the code that's generic to both did. > >>> > >> > >>> > >> > >>> > >> I'd like to be able to associate those differences to code > changes in > >>> > >> order to understand what's going on. I would really appreciate if > >>> > >> others also took some time to test it out or to think about what > could > >>> > >> cause this. > >>> > >> > >>> > >> Thx, > >>> > >> > >>> > >> J-D > >>> > > >>> > +
Matt Corgan 2011-12-15, 20:17
-
Re: Early comparisons between 0.90 and 0.92Jean-Daniel Cryans 2011-12-15, 20:23
The numbers are irrelevant for this discussion as I'm trying to
compare two almost equal things trying to find why there's a difference. But since you're asking nicely: 14 slave nodes, 2x E5520, 24GB of RAM (only 1GB given to HBase), 4 SATA 7200rpm disks. This is the command line I'm using: To load: hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 50 To scan: hbase org.apache.hadoop.hbase.PerformanceEvaluation scan 50 To read: hbase org.apache.hadoop.hbase.PerformanceEvaluation randomRead 50 I'm using 35 mappers per machine, for the random read test I assume the clients have to go over the network 13/14 of the time. For the scan locality should be good, but we don't have a top of the rack bottleneck. After the initial loading I major compact. For all the tests the region remain on the same machines, even across 0.90 and 0.92. BTW PerformanceEvaluation (which we call PE) using 1KB values. The size of a KV is 1.5KB on average according to the HFile tool. Hope this helps, J-D On Thu, Dec 15, 2011 at 12:17 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: > 260k random reads per second is a lot... is that on one node? how many > client threads? and is the client going over the network, is it on the > datanode, or are you using a specialized test where they're in the same > process? > > > On Thu, Dec 15, 2011 at 11:35 AM, Lars <[EMAIL PROTECTED]> wrote: > >> Do you see the same slowdown with the default 64k block size? >> >> Lars <[EMAIL PROTECTED]> schrieb: >> >> >I'll be busy today... I'll double check my scanning related changes as >> soon as i can. >> > >> >Jean-Daniel Cryans <[EMAIL PROTECTED]> schrieb: >> > >> >>Yes and yes. >> >> >> >>J-D >> >>On Dec 14, 2011 5:52 PM, "Matt Corgan" <[EMAIL PROTECTED]> wrote: >> >> >> >>> Regions are major compacted and have empty memstores, so no merging of >> >>> stores when reading? >> >>> >> >>> >> >>> 2011/12/14 Jean-Daniel Cryans <[EMAIL PROTECTED]> >> >>> >> >>> > Yes sorry 1.1M >> >>> > >> >>> > This is PE, the table is set to a block size of 4KB and block caching >> >>> > is disabled. Nothing else special in there. >> >>> > >> >>> > J-D >> >>> > >> >>> > 2011/12/14 <[EMAIL PROTECTED]>: >> >>> > > Thanks for the info, J-D. >> >>> > > >> >>> > > I guess the 1.1 below is in millions. >> >>> > > >> >>> > > Can you tell us more about your tables - bloom filters, etc ? >> >>> > > >> >>> > > >> >>> > > >> >>> > > 在 Dec 14, 2011,5:26 PM,Jean-Daniel Cryans <[EMAIL PROTECTED]> >> 写道: >> >>> > > >> >>> > >> Hey guys, >> >>> > >> >> >>> > >> I was doing some comparisons between 0.90.5 and 0.92.0, mainly >> >>> > >> regarding reads. The numbers are kinda irrelevant but the >> differences >> >>> > >> are. BTW this is on CDH3u3 with random reads. >> >>> > >> >> >>> > >> In 0.90.0, scanning 50M rows that are in the OS cache I go up to >> about >> >>> > >> 1.7M rows scanned per second. >> >>> > >> >> >>> > >> In 0.92.0, scanning those same rows (meaning that I didn't run >> >>> > >> compactions after migrating so it's picking the same data from >> the OS >> >>> > >> cache), I scan about 1.1 rows per second. >> >>> > >> >> >>> > >> 0.92 is 50% slower when scanning. >> >>> > >> >> >>> > >> In 0.90.0 random reading 50M rows that are OS cached I can do >> about >> >>> > >> 200k reads per second. >> >>> > >> >> >>> > >> In 0.92.0, again with those same rows, I can go up to 260k per >> second. >> >>> > >> >> >>> > >> 0.92 is 30% faster when random reading. >> >>> > >> >> >>> > >> I've been playing with that data set for a while and the numbers >> in >> >>> > >> 0.92.0 when using HFileV1 or V2 are pretty much the same meaning >> that >> >>> > >> something else changed or the code that's generic to both did. >> >>> > >> >> >>> > >> >> >>> > >> I'd like to be able to associate those differences to code >> changes in >> >>> > >> order to understand what's going on. I would really appreciate if >> >>> > >> others also took some time to test it out or to think about what +
Jean-Daniel Cryans 2011-12-15, 20:23
-
Re: Early comparisons between 0.90 and 0.92Jean-Daniel Cryans 2011-12-15, 20:24
Trying this now.
J-D On Thu, Dec 15, 2011 at 11:35 AM, Lars <[EMAIL PROTECTED]> wrote: > Do you see the same slowdown with the default 64k block size? > > Lars <[EMAIL PROTECTED]> schrieb: > >>I'll be busy today... I'll double check my scanning related changes as soon as i can. >> >>Jean-Daniel Cryans <[EMAIL PROTECTED]> schrieb: >> >>>Yes and yes. >>> >>>J-D >>>On Dec 14, 2011 5:52 PM, "Matt Corgan" <[EMAIL PROTECTED]> wrote: >>> >>>> Regions are major compacted and have empty memstores, so no merging of >>>> stores when reading? >>>> >>>> >>>> 2011/12/14 Jean-Daniel Cryans <[EMAIL PROTECTED]> >>>> >>>> > Yes sorry 1.1M >>>> > >>>> > This is PE, the table is set to a block size of 4KB and block caching >>>> > is disabled. Nothing else special in there. >>>> > >>>> > J-D >>>> > >>>> > 2011/12/14 <[EMAIL PROTECTED]>: >>>> > > Thanks for the info, J-D. >>>> > > >>>> > > I guess the 1.1 below is in millions. >>>> > > >>>> > > Can you tell us more about your tables - bloom filters, etc ? >>>> > > >>>> > > >>>> > > >>>> > > 在 Dec 14, 2011,5:26 PM,Jean-Daniel Cryans <[EMAIL PROTECTED]> 写道: >>>> > > >>>> > >> Hey guys, >>>> > >> >>>> > >> I was doing some comparisons between 0.90.5 and 0.92.0, mainly >>>> > >> regarding reads. The numbers are kinda irrelevant but the differences >>>> > >> are. BTW this is on CDH3u3 with random reads. >>>> > >> >>>> > >> In 0.90.0, scanning 50M rows that are in the OS cache I go up to about >>>> > >> 1.7M rows scanned per second. >>>> > >> >>>> > >> In 0.92.0, scanning those same rows (meaning that I didn't run >>>> > >> compactions after migrating so it's picking the same data from the OS >>>> > >> cache), I scan about 1.1 rows per second. >>>> > >> >>>> > >> 0.92 is 50% slower when scanning. >>>> > >> >>>> > >> In 0.90.0 random reading 50M rows that are OS cached I can do about >>>> > >> 200k reads per second. >>>> > >> >>>> > >> In 0.92.0, again with those same rows, I can go up to 260k per second. >>>> > >> >>>> > >> 0.92 is 30% faster when random reading. >>>> > >> >>>> > >> I've been playing with that data set for a while and the numbers in >>>> > >> 0.92.0 when using HFileV1 or V2 are pretty much the same meaning that >>>> > >> something else changed or the code that's generic to both did. >>>> > >> >>>> > >> >>>> > >> I'd like to be able to associate those differences to code changes in >>>> > >> order to understand what's going on. I would really appreciate if >>>> > >> others also took some time to test it out or to think about what could >>>> > >> cause this. >>>> > >> >>>> > >> Thx, >>>> > >> >>>> > >> J-D >>>> > >>>> +
Jean-Daniel Cryans 2011-12-15, 20:24
-
Re: Early comparisons between 0.90 and 0.92Matt Corgan 2011-12-15, 20:29
I was hoping to rule out changes in IPC handlers and other upper layers and
narrow it down to the difference between HFileV1 and HFileV2, but it sounds like you have a lot of moving pieces. On Thu, Dec 15, 2011 at 12:24 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote: > Trying this now. > > J-D > > On Thu, Dec 15, 2011 at 11:35 AM, Lars <[EMAIL PROTECTED]> wrote: > > Do you see the same slowdown with the default 64k block size? > > > > Lars <[EMAIL PROTECTED]> schrieb: > > > >>I'll be busy today... I'll double check my scanning related changes as > soon as i can. > >> > >>Jean-Daniel Cryans <[EMAIL PROTECTED]> schrieb: > >> > >>>Yes and yes. > >>> > >>>J-D > >>>On Dec 14, 2011 5:52 PM, "Matt Corgan" <[EMAIL PROTECTED]> wrote: > >>> > >>>> Regions are major compacted and have empty memstores, so no merging of > >>>> stores when reading? > >>>> > >>>> > >>>> 2011/12/14 Jean-Daniel Cryans <[EMAIL PROTECTED]> > >>>> > >>>> > Yes sorry 1.1M > >>>> > > >>>> > This is PE, the table is set to a block size of 4KB and block > caching > >>>> > is disabled. Nothing else special in there. > >>>> > > >>>> > J-D > >>>> > > >>>> > 2011/12/14 <[EMAIL PROTECTED]>: > >>>> > > Thanks for the info, J-D. > >>>> > > > >>>> > > I guess the 1.1 below is in millions. > >>>> > > > >>>> > > Can you tell us more about your tables - bloom filters, etc ? > >>>> > > > >>>> > > > >>>> > > > >>>> > > 在 Dec 14, 2011,5:26 PM,Jean-Daniel Cryans <[EMAIL PROTECTED]> > 写道: > >>>> > > > >>>> > >> Hey guys, > >>>> > >> > >>>> > >> I was doing some comparisons between 0.90.5 and 0.92.0, mainly > >>>> > >> regarding reads. The numbers are kinda irrelevant but the > differences > >>>> > >> are. BTW this is on CDH3u3 with random reads. > >>>> > >> > >>>> > >> In 0.90.0, scanning 50M rows that are in the OS cache I go up to > about > >>>> > >> 1.7M rows scanned per second. > >>>> > >> > >>>> > >> In 0.92.0, scanning those same rows (meaning that I didn't run > >>>> > >> compactions after migrating so it's picking the same data from > the OS > >>>> > >> cache), I scan about 1.1 rows per second. > >>>> > >> > >>>> > >> 0.92 is 50% slower when scanning. > >>>> > >> > >>>> > >> In 0.90.0 random reading 50M rows that are OS cached I can do > about > >>>> > >> 200k reads per second. > >>>> > >> > >>>> > >> In 0.92.0, again with those same rows, I can go up to 260k per > second. > >>>> > >> > >>>> > >> 0.92 is 30% faster when random reading. > >>>> > >> > >>>> > >> I've been playing with that data set for a while and the numbers > in > >>>> > >> 0.92.0 when using HFileV1 or V2 are pretty much the same meaning > that > >>>> > >> something else changed or the code that's generic to both did. > >>>> > >> > >>>> > >> > >>>> > >> I'd like to be able to associate those differences to code > changes in > >>>> > >> order to understand what's going on. I would really appreciate if > >>>> > >> others also took some time to test it out or to think about what > could > >>>> > >> cause this. > >>>> > >> > >>>> > >> Thx, > >>>> > >> > >>>> > >> J-D > >>>> > > >>>> > +
Matt Corgan 2011-12-15, 20:29
-
Re: Early comparisons between 0.90 and 0.92Jean-Daniel Cryans 2011-12-15, 20:53
Well it's not so bad, I'm physically reading the same bytes in the two
tests and I don't migrate to HFileV2 when going to 0.92 For sure in order to dig deeper we'll have to remove all the moving parts we can, but right now if what I tested is true (I did run the whole thing 3 times and kept getting consistent results) it should give us some idea on what people could expect moving to 0.92 as it is right now. J-D On Thu, Dec 15, 2011 at 12:29 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: > I was hoping to rule out changes in IPC handlers and other upper layers and > narrow it down to the difference between HFileV1 and HFileV2, but it sounds > like you have a lot of moving pieces. > > > On Thu, Dec 15, 2011 at 12:24 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote: > >> Trying this now. >> >> J-D >> >> On Thu, Dec 15, 2011 at 11:35 AM, Lars <[EMAIL PROTECTED]> wrote: >> > Do you see the same slowdown with the default 64k block size? >> > >> > Lars <[EMAIL PROTECTED]> schrieb: >> > >> >>I'll be busy today... I'll double check my scanning related changes as >> soon as i can. >> >> >> >>Jean-Daniel Cryans <[EMAIL PROTECTED]> schrieb: >> >> >> >>>Yes and yes. >> >>> >> >>>J-D >> >>>On Dec 14, 2011 5:52 PM, "Matt Corgan" <[EMAIL PROTECTED]> wrote: >> >>> >> >>>> Regions are major compacted and have empty memstores, so no merging of >> >>>> stores when reading? >> >>>> >> >>>> >> >>>> 2011/12/14 Jean-Daniel Cryans <[EMAIL PROTECTED]> >> >>>> >> >>>> > Yes sorry 1.1M >> >>>> > >> >>>> > This is PE, the table is set to a block size of 4KB and block >> caching >> >>>> > is disabled. Nothing else special in there. >> >>>> > >> >>>> > J-D >> >>>> > >> >>>> > 2011/12/14 <[EMAIL PROTECTED]>: >> >>>> > > Thanks for the info, J-D. >> >>>> > > >> >>>> > > I guess the 1.1 below is in millions. >> >>>> > > >> >>>> > > Can you tell us more about your tables - bloom filters, etc ? >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > > 在 Dec 14, 2011,5:26 PM,Jean-Daniel Cryans <[EMAIL PROTECTED]> >> 写道: >> >>>> > > >> >>>> > >> Hey guys, >> >>>> > >> >> >>>> > >> I was doing some comparisons between 0.90.5 and 0.92.0, mainly >> >>>> > >> regarding reads. The numbers are kinda irrelevant but the >> differences >> >>>> > >> are. BTW this is on CDH3u3 with random reads. >> >>>> > >> >> >>>> > >> In 0.90.0, scanning 50M rows that are in the OS cache I go up to >> about >> >>>> > >> 1.7M rows scanned per second. >> >>>> > >> >> >>>> > >> In 0.92.0, scanning those same rows (meaning that I didn't run >> >>>> > >> compactions after migrating so it's picking the same data from >> the OS >> >>>> > >> cache), I scan about 1.1 rows per second. >> >>>> > >> >> >>>> > >> 0.92 is 50% slower when scanning. >> >>>> > >> >> >>>> > >> In 0.90.0 random reading 50M rows that are OS cached I can do >> about >> >>>> > >> 200k reads per second. >> >>>> > >> >> >>>> > >> In 0.92.0, again with those same rows, I can go up to 260k per >> second. >> >>>> > >> >> >>>> > >> 0.92 is 30% faster when random reading. >> >>>> > >> >> >>>> > >> I've been playing with that data set for a while and the numbers >> in >> >>>> > >> 0.92.0 when using HFileV1 or V2 are pretty much the same meaning >> that >> >>>> > >> something else changed or the code that's generic to both did. >> >>>> > >> >> >>>> > >> >> >>>> > >> I'd like to be able to associate those differences to code >> changes in >> >>>> > >> order to understand what's going on. I would really appreciate if >> >>>> > >> others also took some time to test it out or to think about what >> could >> >>>> > >> cause this. >> >>>> > >> >> >>>> > >> Thx, >> >>>> > >> >> >>>> > >> J-D >> >>>> > >> >>>> >> +
Jean-Daniel Cryans 2011-12-15, 20:53
|