|
|
-
performance improvment on regionserver.MemStore/updateColumnValue
N Keywal 2011-07-20, 14:49
Hello,
Some words on the context: We're thinking about using HBase for a product we're developping. For this reason, I am currently looking at HBase source code to understand how to debug & modify HBase. To start with something simple but useful, I am looking for performance improvement by profiling hbase during the execution of the unit tests. I expect that many of the hotspots found on the unit tests are also hotspots in real production. I plan to spend around 10 m.d on this until september. The method regionserver.MemStore/updateColumnValue seems quite used, and is ultimatly responsible of 30% of the time in the test subsets I am using. There is bit of it that can be optimized easily by changing the conditions order:
if (firstKv.matchingQualifier(kv)) { if (kv.getType() == KeyValue.Type.Put.getCode()) { now = Math.max(now, kv.getTimestamp()); } }
becomes: if (kv.getType() == KeyValue.Type.Put.getCode() && kv.getTimestamp() > now && firstKv.matchingQualifier(kv)) { now = kv.getTimestamp(); }
As comparing the qualifier is much more expensive, we put it at the end. It improve the performances by 3% (i.e: total execution time lowered by 3%). So first question: would you be interested by a patch for this kind of stuff?
Second question (more technical...): in this method (regionserver.MemStore/updateColumnValue), I see:
KeyValue firstKv = KeyValue.createFirstOnRow( row, family, qualifier);
[...] while (it.hasNext()) { KeyValue kv = it.next();
// if this isnt the row we are interested in, then bail: if (!firstKv.matchingColumn(family, qualifier) || !firstKv.matchingRow(kv)) { break; // rows dont match, bail. }
[...] }
For the test "firstKv.matchingColumn(family, qualifier)", I don't see: 1) Why it is tested in the loop, as firstKv is not modified, the result won't change. 2) How the result can be 'false', as firstKv is inialized with the family and the parameters.
Or is it shared for update a way or another?
If we can remove it, we gain another 2%... N.
+
N Keywal 2011-07-20, 14:49
-
Re: performance improvment on regionserver.MemStore/updateColumnValue
Joey Echeverria 2011-07-20, 15:02
I would compare YCSB between the patched and unpatched version for a more realistic workload than the unit tests provide.
-Joey
On Wed, Jul 20, 2011 at 10:49 AM, N Keywal <[EMAIL PROTECTED]> wrote:
> Hello, > > Some words on the context: We're thinking about using HBase for a product > we're developping. For this reason, I am currently looking at HBase source > code to understand how to debug & modify HBase. To start with something > simple but useful, I am looking for performance improvement by profiling > hbase during the execution of the unit tests. I expect that many of the > hotspots found on the unit tests are also hotspots in real production. I > plan to spend around 10 m.d on this until september. > > > The method regionserver.MemStore/updateColumnValue seems quite used, and is > ultimatly responsible of 30% of the time in the test subsets I am using. > > > There is bit of it that can be optimized easily by changing the conditions > order: > > if (firstKv.matchingQualifier(kv)) { > if (kv.getType() == KeyValue.Type.Put.getCode()) { > now = Math.max(now, kv.getTimestamp()); > } > } > > becomes: > if (kv.getType() == KeyValue.Type.Put.getCode() && > kv.getTimestamp() > now && > firstKv.matchingQualifier(kv)) { > now = kv.getTimestamp(); > } > > As comparing the qualifier is much more expensive, we put it at the end. > It improve the performances by 3% (i.e: total execution time lowered by > 3%). > > > So first question: would you be interested by a patch for this kind of > stuff? > > > > Second question (more technical...): in this method > (regionserver.MemStore/updateColumnValue), I see: > > KeyValue firstKv = KeyValue.createFirstOnRow( > row, family, qualifier); > > [...] > while (it.hasNext()) { > KeyValue kv = it.next(); > > // if this isnt the row we are interested in, then bail: > if (!firstKv.matchingColumn(family, qualifier) || > !firstKv.matchingRow(kv)) { > break; // rows dont match, bail. > } > > [...] > } > > For the test "firstKv.matchingColumn(family, qualifier)", I don't see: > 1) Why it is tested in the loop, as firstKv is not modified, the result > won't change. > 2) How the result can be 'false', as firstKv is inialized with the family > and the parameters. > > Or is it shared for update a way or another? > > If we can remove it, we gain another 2%... > > > N. >
-- Joseph Echeverria Cloudera, Inc. 443.305.9434
+
Joey Echeverria 2011-07-20, 15:02
-
Re: performance improvment on regionserver.MemStore/updateColumnValue
N Keywal 2011-07-20, 15:28
Aggreed. But there is a big advantage when you work on the issues found on the unit tesst: the code you're modifying is already covered by the unit tests... :-)
On Wed, Jul 20, 2011 at 5:02 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote:
> I would compare YCSB between the patched and unpatched version for a more > realistic workload than the unit tests provide. > > -Joey > > On Wed, Jul 20, 2011 at 10:49 AM, N Keywal <[EMAIL PROTECTED]> wrote: > > > Hello, > > > > Some words on the context: We're thinking about using HBase for a product > > we're developping. For this reason, I am currently looking at HBase > source > > code to understand how to debug & modify HBase. To start with something > > simple but useful, I am looking for performance improvement by profiling > > hbase during the execution of the unit tests. I expect that many of the > > hotspots found on the unit tests are also hotspots in real production. I > > plan to spend around 10 m.d on this until september. > > > > > > The method regionserver.MemStore/updateColumnValue seems quite used, and > is > > ultimatly responsible of 30% of the time in the test subsets I am using. > > > > > > There is bit of it that can be optimized easily by changing the > conditions > > order: > > > > if (firstKv.matchingQualifier(kv)) { > > if (kv.getType() == KeyValue.Type.Put.getCode()) { > > now = Math.max(now, kv.getTimestamp()); > > } > > } > > > > becomes: > > if (kv.getType() == KeyValue.Type.Put.getCode() && > > kv.getTimestamp() > now && > > firstKv.matchingQualifier(kv)) { > > now = kv.getTimestamp(); > > } > > > > As comparing the qualifier is much more expensive, we put it at the end. > > It improve the performances by 3% (i.e: total execution time lowered by > > 3%). > > > > > > So first question: would you be interested by a patch for this kind of > > stuff? > > > > > > > > Second question (more technical...): in this method > > (regionserver.MemStore/updateColumnValue), I see: > > > > KeyValue firstKv = KeyValue.createFirstOnRow( > > row, family, qualifier); > > > > [...] > > while (it.hasNext()) { > > KeyValue kv = it.next(); > > > > // if this isnt the row we are interested in, then bail: > > if (!firstKv.matchingColumn(family, qualifier) || > > !firstKv.matchingRow(kv)) { > > break; // rows dont match, bail. > > } > > > > [...] > > } > > > > For the test "firstKv.matchingColumn(family, qualifier)", I don't see: > > 1) Why it is tested in the loop, as firstKv is not modified, the result > > won't change. > > 2) How the result can be 'false', as firstKv is inialized with the > family > > and the parameters. > > > > Or is it shared for update a way or another? > > > > If we can remove it, we gain another 2%... > > > > > > N. > > > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 >
+
N Keywal 2011-07-20, 15:28
-
Re: performance improvment on regionserver.MemStore/updateColumnValue
Joey Echeverria 2011-07-20, 15:32
My suggestion was just to verify the performance gains. Do the profiling on unit tests and then do scale up tests with YCSB.
-Joey
On Wed, Jul 20, 2011 at 11:28 AM, N Keywal <[EMAIL PROTECTED]> wrote:
> Aggreed. But there is a big advantage when you work on the issues found on > the unit tesst: the code you're modifying is already covered by the unit > tests... :-) > > On Wed, Jul 20, 2011 at 5:02 PM, Joey Echeverria <[EMAIL PROTECTED]> > wrote: > > > I would compare YCSB between the patched and unpatched version for a more > > realistic workload than the unit tests provide. > > > > -Joey > > > > On Wed, Jul 20, 2011 at 10:49 AM, N Keywal <[EMAIL PROTECTED]> wrote: > > > > > Hello, > > > > > > Some words on the context: We're thinking about using HBase for a > product > > > we're developping. For this reason, I am currently looking at HBase > > source > > > code to understand how to debug & modify HBase. To start with > something > > > simple but useful, I am looking for performance improvement by > profiling > > > hbase during the execution of the unit tests. I expect that many of the > > > hotspots found on the unit tests are also hotspots in real production. > I > > > plan to spend around 10 m.d on this until september. > > > > > > > > > The method regionserver.MemStore/updateColumnValue seems quite used, > and > > is > > > ultimatly responsible of 30% of the time in the test subsets I am > using. > > > > > > > > > There is bit of it that can be optimized easily by changing the > > conditions > > > order: > > > > > > if (firstKv.matchingQualifier(kv)) { > > > if (kv.getType() == KeyValue.Type.Put.getCode()) { > > > now = Math.max(now, kv.getTimestamp()); > > > } > > > } > > > > > > becomes: > > > if (kv.getType() == KeyValue.Type.Put.getCode() && > > > kv.getTimestamp() > now && > > > firstKv.matchingQualifier(kv)) { > > > now = kv.getTimestamp(); > > > } > > > > > > As comparing the qualifier is much more expensive, we put it at the > end. > > > It improve the performances by 3% (i.e: total execution time lowered > by > > > 3%). > > > > > > > > > So first question: would you be interested by a patch for this kind of > > > stuff? > > > > > > > > > > > > Second question (more technical...): in this method > > > (regionserver.MemStore/updateColumnValue), I see: > > > > > > KeyValue firstKv = KeyValue.createFirstOnRow( > > > row, family, qualifier); > > > > > > [...] > > > while (it.hasNext()) { > > > KeyValue kv = it.next(); > > > > > > // if this isnt the row we are interested in, then bail: > > > if (!firstKv.matchingColumn(family, qualifier) || > > > !firstKv.matchingRow(kv)) { > > > break; // rows dont match, bail. > > > } > > > > > > [...] > > > } > > > > > > For the test "firstKv.matchingColumn(family, qualifier)", I don't > see: > > > 1) Why it is tested in the loop, as firstKv is not modified, the > result > > > won't change. > > > 2) How the result can be 'false', as firstKv is inialized with the > > family > > > and the parameters. > > > > > > Or is it shared for update a way or another? > > > > > > If we can remove it, we gain another 2%... > > > > > > > > > N. > > > > > > > > > > > -- > > Joseph Echeverria > > Cloudera, Inc. > > 443.305.9434 > > >
-- Joseph Echeverria Cloudera, Inc. 443.305.9434
+
Joey Echeverria 2011-07-20, 15:32
-
Re: performance improvment on regionserver.MemStore/updateColumnValue
Ted Yu 2011-07-20, 15:08
Thanks for your findings.
The second finding is more interesting. According to the javadoc: // if this isnt the row we are interested in, then bail: firstKv should be replaced by kv.
Please file a JIRA.
On Wed, Jul 20, 2011 at 7:49 AM, N Keywal <[EMAIL PROTECTED]> wrote:
> Hello, > > Some words on the context: We're thinking about using HBase for a product > we're developping. For this reason, I am currently looking at HBase source > code to understand how to debug & modify HBase. To start with something > simple but useful, I am looking for performance improvement by profiling > hbase during the execution of the unit tests. I expect that many of the > hotspots found on the unit tests are also hotspots in real production. I > plan to spend around 10 m.d on this until september. > > > The method regionserver.MemStore/updateColumnValue seems quite used, and is > ultimatly responsible of 30% of the time in the test subsets I am using. > > > There is bit of it that can be optimized easily by changing the conditions > order: > > if (firstKv.matchingQualifier(kv)) { > if (kv.getType() == KeyValue.Type.Put.getCode()) { > now = Math.max(now, kv.getTimestamp()); > } > } > > becomes: > if (kv.getType() == KeyValue.Type.Put.getCode() && > kv.getTimestamp() > now && > firstKv.matchingQualifier(kv)) { > now = kv.getTimestamp(); > } > > As comparing the qualifier is much more expensive, we put it at the end. > It improve the performances by 3% (i.e: total execution time lowered by > 3%). > > > So first question: would you be interested by a patch for this kind of > stuff? > > > > Second question (more technical...): in this method > (regionserver.MemStore/updateColumnValue), I see: > > KeyValue firstKv = KeyValue.createFirstOnRow( > row, family, qualifier); > > [...] > while (it.hasNext()) { > KeyValue kv = it.next(); > > // if this isnt the row we are interested in, then bail: > if (!firstKv.matchingColumn(family, qualifier) || > !firstKv.matchingRow(kv)) { > break; // rows dont match, bail. > } > > [...] > } > > For the test "firstKv.matchingColumn(family, qualifier)", I don't see: > 1) Why it is tested in the loop, as firstKv is not modified, the result > won't change. > 2) How the result can be 'false', as firstKv is inialized with the family > and the parameters. > > Or is it shared for update a way or another? > > If we can remove it, we gain another 2%... > > > N. >
+
Ted Yu 2011-07-20, 15:08
-
Re: performance improvment on regionserver.MemStore/updateColumnValue
N Keywal 2011-07-20, 15:20
>firstKv should be replaced by kv. >Please file a JIRA. Done: HBASE-4118 < https://issues.apache.org/jira/browse/HBASE-4118>On Wed, Jul 20, 2011 at 5:08 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > Thanks for your findings. > > The second finding is more interesting. According to the javadoc: > // if this isnt the row we are interested in, then bail: > firstKv should be replaced by kv. > > Please file a JIRA. > > On Wed, Jul 20, 2011 at 7:49 AM, N Keywal <[EMAIL PROTECTED]> wrote: > > > Hello, > > > > Some words on the context: We're thinking about using HBase for a product > > we're developping. For this reason, I am currently looking at HBase > source > > code to understand how to debug & modify HBase. To start with something > > simple but useful, I am looking for performance improvement by profiling > > hbase during the execution of the unit tests. I expect that many of the > > hotspots found on the unit tests are also hotspots in real production. I > > plan to spend around 10 m.d on this until september. > > > > > > The method regionserver.MemStore/updateColumnValue seems quite used, and > is > > ultimatly responsible of 30% of the time in the test subsets I am using. > > > > > > There is bit of it that can be optimized easily by changing the > conditions > > order: > > > > if (firstKv.matchingQualifier(kv)) { > > if (kv.getType() == KeyValue.Type.Put.getCode()) { > > now = Math.max(now, kv.getTimestamp()); > > } > > } > > > > becomes: > > if (kv.getType() == KeyValue.Type.Put.getCode() && > > kv.getTimestamp() > now && > > firstKv.matchingQualifier(kv)) { > > now = kv.getTimestamp(); > > } > > > > As comparing the qualifier is much more expensive, we put it at the end. > > It improve the performances by 3% (i.e: total execution time lowered by > > 3%). > > > > > > So first question: would you be interested by a patch for this kind of > > stuff? > > > > > > > > Second question (more technical...): in this method > > (regionserver.MemStore/updateColumnValue), I see: > > > > KeyValue firstKv = KeyValue.createFirstOnRow( > > row, family, qualifier); > > > > [...] > > while (it.hasNext()) { > > KeyValue kv = it.next(); > > > > // if this isnt the row we are interested in, then bail: > > if (!firstKv.matchingColumn(family, qualifier) || > > !firstKv.matchingRow(kv)) { > > break; // rows dont match, bail. > > } > > > > [...] > > } > > > > For the test "firstKv.matchingColumn(family, qualifier)", I don't see: > > 1) Why it is tested in the loop, as firstKv is not modified, the result > > won't change. > > 2) How the result can be 'false', as firstKv is inialized with the > family > > and the parameters. > > > > Or is it shared for update a way or another? > > > > If we can remove it, we gain another 2%... > > > > > > N. > > >
+
N Keywal 2011-07-20, 15:20
-
Re: performance improvment on regionserver.MemStore/updateColumnValue
Stack 2011-07-20, 15:15
On Wed, Jul 20, 2011 at 7:49 AM, N Keywal <[EMAIL PROTECTED]> wrote: > So first question: would you be interested by a patch for this kind of > stuff? >
Yes. St.Ack
+
Stack 2011-07-20, 15:15
|
|