|
Jeff Whiting
2012-06-27, 21:03
Ted Yu
2012-06-27, 21:15
Amitanand Aiyer
2012-06-27, 22:05
Ted Yu
2012-06-27, 22:11
Ted Yu
2012-06-27, 22:44
Jeff Whiting
2012-06-27, 23:15
Ted Yu
2012-06-27, 23:50
Jeff Whiting
2012-06-28, 14:37
|
-
Slow row deletion performance in comparison to insertionJeff Whiting 2012-06-27, 21:03
I'm struggling to understand why my deletes are taking longer than my inserts. My understanding is
that a delete is just an insertion of a tombstone. And I'm deleting the entire row. I do a simple loop (pseudo code) and insert the 100 byte rows: for (int i=0; i < 50000; i++) { puts.append(new Put(rowkey[i], oneHundredBytes[i])); if (puts.size() % 1000 == 0) { Benchmark.start(); table.batch(puts); Benchmark.stop(); } } The above takes about 8282ms total. However the delete takes more than twice as long: Iterator it = table.getScannerScan(rowkey[0], rowkey[50000-1]).iterator(); while(it.hasNext()) { r = it.next(); deletes.append(new Delete(r.getRow())); if (deletes.size() % 1000 == 0) { Benchmark.start(); table.batch(deletes); Benchmark.stop(); } } The above takes 17369ms total. I'm only benchmarking the deletion time and not the scan time. Additionally if I batch the deletes into one big one at the end (rather than while I'm scanning) it takes about the same amount of time. I am deleting the entire row so I wouldn't think it would be doing a read before the delete (http://mail-archives.apache.org/mod_mbox/hbase-user/201206.mbox/%[EMAIL PROTECTED]l%3E). Any thoughts on why it is slower and how I can speed it up? Thanks, ~Jeff -- Jeff Whiting Qualtrics Senior Software Engineer [EMAIL PROTECTED]
-
Re: Slow row deletion performance in comparison to insertionTed Yu 2012-06-27, 21:15
bq. if I batch the deletes into one big one at the end (rather than while
I'm scanning) That's what you should do. See also HBASE-6284 where an optimization, HRegion#doMiniBatchDelete(), is under development. On Wed, Jun 27, 2012 at 2:03 PM, Jeff Whiting <[EMAIL PROTECTED]> wrote: > I'm struggling to understand why my deletes are taking longer than my > inserts. My understanding is that a delete is just an insertion of a > tombstone. And I'm deleting the entire row. > > I do a simple loop (pseudo code) and insert the 100 byte rows: > > for (int i=0; i < 50000; i++) > { > puts.append(new Put(rowkey[i], oneHundredBytes[i])); > > if (puts.size() % 1000 == 0) > { > Benchmark.start(); > table.batch(puts); > Benchmark.stop(); > } > } > > > The above takes about 8282ms total. > > However the delete takes more than twice as long: > > Iterator it = table.getScannerScan(rowkey[0]**, > rowkey[50000-1]).iterator(); > while(it.hasNext()) > { > r = it.next(); > deletes.append(new Delete(r.getRow())); > if (deletes.size() % 1000 == 0) > { > Benchmark.start(); > table.batch(deletes); > Benchmark.stop(); > } > } > > The above takes 17369ms total. > > I'm only benchmarking the deletion time and not the scan time. > Additionally if I batch the deletes into one big one at the end (rather > than while I'm scanning) it takes about the same amount of time. I am > deleting the entire row so I wouldn't think it would be doing a read before > the delete (http://mail-archives.apache.**org/mod_mbox/hbase-user/** > 201206.mbox/%**3CE83D30E8F408F94A96F992785FC2**9D82063395D6@s2k3mntaexc1.* > *mentacapital.local%3E<http://mail-archives.apache.org/mod_mbox/hbase-user/201206.mbox/%[EMAIL PROTECTED]l%3E> > ). > > Any thoughts on why it is slower and how I can speed it up? > > Thanks, > ~Jeff > > -- > Jeff Whiting > Qualtrics Senior Software Engineer > [EMAIL PROTECTED] > >
-
Re: Slow row deletion performance in comparison to insertionAmitanand Aiyer 2012-06-27, 22:05
There was some difference in the way locks are taken for batched deletes and puts. This was fixed for 89.
I wonder if the same could be the issue here. Sent from my iPhone On Jun 27, 2012, at 2:04 PM, "Jeff Whiting" <[EMAIL PROTECTED]> wrote: > I'm struggling to understand why my deletes are taking longer than my inserts. My understanding is that a delete is just an insertion of a tombstone. And I'm deleting the entire row. > > I do a simple loop (pseudo code) and insert the 100 byte rows: > > for (int i=0; i < 50000; i++) > { > puts.append(new Put(rowkey[i], oneHundredBytes[i])); > > if (puts.size() % 1000 == 0) > { > Benchmark.start(); > table.batch(puts); > Benchmark.stop(); > } > } > > > The above takes about 8282ms total. > > However the delete takes more than twice as long: > > Iterator it = table.getScannerScan(rowkey[0], rowkey[50000-1]).iterator(); > while(it.hasNext()) > { > r = it.next(); > deletes.append(new Delete(r.getRow())); > if (deletes.size() % 1000 == 0) > { > Benchmark.start(); > table.batch(deletes); > Benchmark.stop(); > } > } > > The above takes 17369ms total. > > I'm only benchmarking the deletion time and not the scan time. Additionally if I batch the deletes into one big one at the end (rather than while I'm scanning) it takes about the same amount of time. I am deleting the entire row so I wouldn't think it would be doing a read before the delete (http://mail-archives.apache.org/mod_mbox/hbase-user/201206.mbox/%[EMAIL PROTECTED]l%3E). > > Any thoughts on why it is slower and how I can speed it up? > > Thanks, > ~Jeff > > -- > Jeff Whiting > Qualtrics Senior Software Engineer > [EMAIL PROTECTED] >
-
Re: Slow row deletion performance in comparison to insertionTed Yu 2012-06-27, 22:11
Amit:
Can you point us to the JIRA or changelist in 0.89-fb ? Thanks On Wed, Jun 27, 2012 at 3:05 PM, Amitanand Aiyer <[EMAIL PROTECTED]> wrote: > There was some difference in the way locks are taken for batched deletes > and puts. This was fixed for 89. > > I wonder if the same could be the issue here. > > Sent from my iPhone > > On Jun 27, 2012, at 2:04 PM, "Jeff Whiting" <[EMAIL PROTECTED]> wrote: > > > I'm struggling to understand why my deletes are taking longer than my > inserts. My understanding is that a delete is just an insertion of a > tombstone. And I'm deleting the entire row. > > > > I do a simple loop (pseudo code) and insert the 100 byte rows: > > > > for (int i=0; i < 50000; i++) > > { > > puts.append(new Put(rowkey[i], oneHundredBytes[i])); > > > > if (puts.size() % 1000 == 0) > > { > > Benchmark.start(); > > table.batch(puts); > > Benchmark.stop(); > > } > > } > > > > > > The above takes about 8282ms total. > > > > However the delete takes more than twice as long: > > > > Iterator it = table.getScannerScan(rowkey[0], > rowkey[50000-1]).iterator(); > > while(it.hasNext()) > > { > > r = it.next(); > > deletes.append(new Delete(r.getRow())); > > if (deletes.size() % 1000 == 0) > > { > > Benchmark.start(); > > table.batch(deletes); > > Benchmark.stop(); > > } > > } > > > > The above takes 17369ms total. > > > > I'm only benchmarking the deletion time and not the scan time. > Additionally if I batch the deletes into one big one at the end (rather > than while I'm scanning) it takes about the same amount of time. I am > deleting the entire row so I wouldn't think it would be doing a read before > the delete ( > http://mail-archives.apache.org/mod_mbox/hbase-user/201206.mbox/%[EMAIL PROTECTED]l%3E > ). > > > > Any thoughts on why it is slower and how I can speed it up? > > > > Thanks, > > ~Jeff > > > > -- > > Jeff Whiting > > Qualtrics Senior Software Engineer > > [EMAIL PROTECTED] > > >
-
Re: Slow row deletion performance in comparison to insertionTed Yu 2012-06-27, 22:44
The JIRA was HBASE-5941
On Wed, Jun 27, 2012 at 3:05 PM, Amitanand Aiyer <[EMAIL PROTECTED]> wrote: > There was some difference in the way locks are taken for batched deletes > and puts. This was fixed for 89. > > I wonder if the same could be the issue here. > > Sent from my iPhone > > On Jun 27, 2012, at 2:04 PM, "Jeff Whiting" <[EMAIL PROTECTED]> wrote: > > > I'm struggling to understand why my deletes are taking longer than my > inserts. My understanding is that a delete is just an insertion of a > tombstone. And I'm deleting the entire row. > > > > I do a simple loop (pseudo code) and insert the 100 byte rows: > > > > for (int i=0; i < 50000; i++) > > { > > puts.append(new Put(rowkey[i], oneHundredBytes[i])); > > > > if (puts.size() % 1000 == 0) > > { > > Benchmark.start(); > > table.batch(puts); > > Benchmark.stop(); > > } > > } > > > > > > The above takes about 8282ms total. > > > > However the delete takes more than twice as long: > > > > Iterator it = table.getScannerScan(rowkey[0], > rowkey[50000-1]).iterator(); > > while(it.hasNext()) > > { > > r = it.next(); > > deletes.append(new Delete(r.getRow())); > > if (deletes.size() % 1000 == 0) > > { > > Benchmark.start(); > > table.batch(deletes); > > Benchmark.stop(); > > } > > } > > > > The above takes 17369ms total. > > > > I'm only benchmarking the deletion time and not the scan time. > Additionally if I batch the deletes into one big one at the end (rather > than while I'm scanning) it takes about the same amount of time. I am > deleting the entire row so I wouldn't think it would be doing a read before > the delete ( > http://mail-archives.apache.org/mod_mbox/hbase-user/201206.mbox/%[EMAIL PROTECTED]l%3E > ). > > > > Any thoughts on why it is slower and how I can speed it up? > > > > Thanks, > > ~Jeff > > > > -- > > Jeff Whiting > > Qualtrics Senior Software Engineer > > [EMAIL PROTECTED] > > >
-
Re: Slow row deletion performance in comparison to insertionJeff Whiting 2012-06-27, 23:15
Looking at HBASE-6284 it seems that deletes are not batched at the regionserver level so that is the
reason for the performance degradation. Additionally HBASE-5941 with the locks is also contributing to the performance degradation. So until those changes get into an hbase release I just have to live with the slower performance. Is there anything I need to do on my end? Just as a sanity check, I tried setting a timestamp in the delete object but it made no difference. I'll batch my deletes at end as you suggested (as memory allows). Thanks, ~Jeff On 6/27/2012 4:11 PM, Ted Yu wrote: > Amit: > Can you point us to the JIRA or changelist in 0.89-fb ? > > Thanks > > On Wed, Jun 27, 2012 at 3:05 PM, Amitanand Aiyer <[EMAIL PROTECTED]> wrote: > >> There was some difference in the way locks are taken for batched deletes >> and puts. This was fixed for 89. >> >> I wonder if the same could be the issue here. >> >> Sent from my iPhone >> >> On Jun 27, 2012, at 2:04 PM, "Jeff Whiting" <[EMAIL PROTECTED]> wrote: >> >>> I'm struggling to understand why my deletes are taking longer than my >> inserts. My understanding is that a delete is just an insertion of a >> tombstone. And I'm deleting the entire row. >>> I do a simple loop (pseudo code) and insert the 100 byte rows: >>> >>> for (int i=0; i < 50000; i++) >>> { >>> puts.append(new Put(rowkey[i], oneHundredBytes[i])); >>> >>> if (puts.size() % 1000 == 0) >>> { >>> Benchmark.start(); >>> table.batch(puts); >>> Benchmark.stop(); >>> } >>> } >>> >>> >>> The above takes about 8282ms total. >>> >>> However the delete takes more than twice as long: >>> >>> Iterator it = table.getScannerScan(rowkey[0], >> rowkey[50000-1]).iterator(); >>> while(it.hasNext()) >>> { >>> r = it.next(); >>> deletes.append(new Delete(r.getRow())); >>> if (deletes.size() % 1000 == 0) >>> { >>> Benchmark.start(); >>> table.batch(deletes); >>> Benchmark.stop(); >>> } >>> } >>> >>> The above takes 17369ms total. >>> >>> I'm only benchmarking the deletion time and not the scan time. >> Additionally if I batch the deletes into one big one at the end (rather >> than while I'm scanning) it takes about the same amount of time. I am >> deleting the entire row so I wouldn't think it would be doing a read before >> the delete ( >> http://mail-archives.apache.org/mod_mbox/hbase-user/201206.mbox/%[EMAIL PROTECTED]l%3E >> ). >>> Any thoughts on why it is slower and how I can speed it up? >>> >>> Thanks, >>> ~Jeff >>> >>> -- >>> Jeff Whiting >>> Qualtrics Senior Software Engineer >>> [EMAIL PROTECTED] >>> -- Jeff Whiting Qualtrics Senior Software Engineer [EMAIL PROTECTED]
-
Re: Slow row deletion performance in comparison to insertionTed Yu 2012-06-27, 23:50
I created HBASE-6287 <https://issues.apache.org/jira/browse/HBASE-6287> for
porting HBASE-5941 to trunk. Jeff: What version of HBase are you using ? Since HBASE-5941 is an improvement, a vote may be raised for porting it to other branches. On Wed, Jun 27, 2012 at 4:15 PM, Jeff Whiting <[EMAIL PROTECTED]> wrote: > Looking at HBASE-6284 it seems that deletes are not batched at the > regionserver level so that is the reason for the performance degradation. > Additionally HBASE-5941 with the locks is also contributing to the > performance degradation. > > So until those changes get into an hbase release I just have to live with > the slower performance. Is there anything I need to do on my end? > > Just as a sanity check, I tried setting a timestamp in the delete object > but it made no difference. I'll batch my deletes at end as you suggested > (as memory allows). > > Thanks, > ~Jeff > > On 6/27/2012 4:11 PM, Ted Yu wrote: > >> Amit: >> Can you point us to the JIRA or changelist in 0.89-fb ? >> >> Thanks >> >> >> On Wed, Jun 27, 2012 at 3:05 PM, Amitanand Aiyer <[EMAIL PROTECTED]> >> wrote: >> >> There was some difference in the way locks are taken for batched deletes >>> and puts. This was fixed for 89. >>> >>> I wonder if the same could be the issue here. >>> >>> Sent from my iPhone >>> >>> On Jun 27, 2012, at 2:04 PM, "Jeff Whiting" <[EMAIL PROTECTED]> wrote: >>> >>> I'm struggling to understand why my deletes are taking longer than my >>>> >>> inserts. My understanding is that a delete is just an insertion of a >>> tombstone. And I'm deleting the entire row. >>> >>>> I do a simple loop (pseudo code) and insert the 100 byte rows: >>>> >>>> for (int i=0; i < 50000; i++) >>>> { >>>> puts.append(new Put(rowkey[i], oneHundredBytes[i])); >>>> >>>> if (puts.size() % 1000 == 0) >>>> { >>>> Benchmark.start(); >>>> table.batch(puts); >>>> Benchmark.stop(); >>>> } >>>> } >>>> >>>> >>>> The above takes about 8282ms total. >>>> >>>> However the delete takes more than twice as long: >>>> >>>> Iterator it = table.getScannerScan(rowkey[0]**, >>>> >>> rowkey[50000-1]).iterator(); >>> >>>> while(it.hasNext()) >>>> { >>>> r = it.next(); >>>> deletes.append(new Delete(r.getRow())); >>>> if (deletes.size() % 1000 == 0) >>>> { >>>> Benchmark.start(); >>>> table.batch(deletes); >>>> Benchmark.stop(); >>>> } >>>> } >>>> >>>> The above takes 17369ms total. >>>> >>>> I'm only benchmarking the deletion time and not the scan time. >>>> >>> Additionally if I batch the deletes into one big one at the end (rather >>> than while I'm scanning) it takes about the same amount of time. I am >>> deleting the entire row so I wouldn't think it would be doing a read >>> before >>> the delete ( >>> http://mail-archives.apache.**org/mod_mbox/hbase-user/**201206.mbox/%** >>> 3CE83D30E8F408F94A96F992785FC2**9D82063395D6@s2k3mntaexc1.** >>> mentacapital.local%3E<http://mail-archives.apache.org/mod_mbox/hbase-user/201206.mbox/%[EMAIL PROTECTED]l%3E> >>> ). >>> >>>> Any thoughts on why it is slower and how I can speed it up? >>>> >>>> Thanks, >>>> ~Jeff >>>> >>>> -- >>>> Jeff Whiting >>>> Qualtrics Senior Software Engineer >>>> [EMAIL PROTECTED] >>>> >>>> > -- > Jeff Whiting > Qualtrics Senior Software Engineer > [EMAIL PROTECTED] > > > >
-
Re: Slow row deletion performance in comparison to insertionJeff Whiting 2012-06-28, 14:37
0.90.4-cdh3u3 is the version I'm running.
~Jeff On 6/27/2012 5:50 PM, Ted Yu wrote: > I created HBASE-6287 <https://issues.apache.org/jira/browse/HBASE-6287> for porting HBASE-5941 to > trunk. > > Jeff: > What version of HBase are you using ? > > Since HBASE-5941 is an improvement, a vote may be raised for porting it to other branches. > > On Wed, Jun 27, 2012 at 4:15 PM, Jeff Whiting <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> > wrote: > > Looking at HBASE-6284 it seems that deletes are not batched at the regionserver level so that > is the reason for the performance degradation. Additionally HBASE-5941 with the locks is also > contributing to the performance degradation. > > So until those changes get into an hbase release I just have to live with the slower > performance. Is there anything I need to do on my end? > > Just as a sanity check, I tried setting a timestamp in the delete object but it made no > difference. I'll batch my deletes at end as you suggested (as memory allows). > > Thanks, > ~Jeff > > On 6/27/2012 4:11 PM, Ted Yu wrote: > > Amit: > Can you point us to the JIRA or changelist in 0.89-fb ? > > Thanks > > > On Wed, Jun 27, 2012 at 3:05 PM, Amitanand Aiyer <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > There was some difference in the way locks are taken for batched deletes > and puts. This was fixed for 89. > > I wonder if the same could be the issue here. > > Sent from my iPhone > > On Jun 27, 2012, at 2:04 PM, "Jeff Whiting" <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > I'm struggling to understand why my deletes are taking longer than my > > inserts. My understanding is that a delete is just an insertion of a > tombstone. And I'm deleting the entire row. > > I do a simple loop (pseudo code) and insert the 100 byte rows: > > for (int i=0; i < 50000; i++) > { > puts.append(new Put(rowkey[i], oneHundredBytes[i])); > > if (puts.size() % 1000 == 0) > { > Benchmark.start(); > table.batch(puts); > Benchmark.stop(); > } > } > > > The above takes about 8282ms total. > > However the delete takes more than twice as long: > > Iterator it = table.getScannerScan(rowkey[0], > > rowkey[50000-1]).iterator(); > > while(it.hasNext()) > { > r = it.next(); > deletes.append(new Delete(r.getRow())); > if (deletes.size() % 1000 == 0) > { > Benchmark.start(); > table.batch(deletes); > Benchmark.stop(); > } > } > > The above takes 17369ms total. > > I'm only benchmarking the deletion time and not the scan time. > > Additionally if I batch the deletes into one big one at the end (rather > than while I'm scanning) it takes about the same amount of time. I am > deleting the entire row so I wouldn't think it would be doing a read before > the delete ( > http://mail-archives.apache.org/mod_mbox/hbase-user/201206.mbox/%[EMAIL PROTECTED]l%3E > ). > > Any thoughts on why it is slower and how I can speed it up? > > Thanks, > ~Jeff > > -- > Jeff Whiting > Qualtrics Senior Software Engineer > [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > > > -- > Jeff Whiting Jeff Whiting Qualtrics Senior Software Engineer [EMAIL PROTECTED] |