Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> hbase delete operation is very slow

Haijia Zhou 2012-02-21, 20:52
Doug Meil 2012-02-21, 22:45
Stack 2012-02-22, 00:39
Doug Meil 2012-02-22, 01:54
Stack 2012-02-22, 02:13
Copy link to this message
RE: hbase delete operation is very slow
Thanks for the suggestion. I did use List<Delete> with size 1000, actually the performance was not that different from deleting one row at a time.
I investigated HRegion.delete() method, my understanding is that when you call delete() to delete a row, it's actually going to delete all the column families for that row first, meaning it'll put tombstone to each family column.
In my case each row has 5 family columns, that means each delete will result in putting 5 tombstones to the row, I am thinking that could be the reason why delete is so slow.

I  am just wondering if there's anyway or tools we can profile a hbase application to measure the time taken on each individual methods.


-----Original Message-----
From: Doug Meil [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, February 21, 2012 8:54 PM
Subject: Re: hbase delete operation is very slow
I don't think write-buffering is an option because that's Put-only the last time I looked, but the advice I put in the book is to use the delete(List<Delete>).  He'll have to keep track of the List<Delete> himself and determine when the batch should be sent, but it's a lot better than one at a time.
On 2/21/12 7:39 PM, "Stack" <[EMAIL PROTECTED]> wrote:

>On Tue, Feb 21, 2012 at 2:45 PM, Doug Meil
>> Hi there-
>> You probably want to see this...
>> http://hbase.apache.org/book.html#perf.deleting
>> .. that particular method doesn't use the write-buffer and is
>> submitting deletes one-by-one to the RS's.
>Do what Doug suggests.  Sounds like you are setting up a Map per row
>and then per row, figuring whether to Delete.  If a Delete, you do an
>invocation per.  Where are you getting your table instance from?  Is it
>created each time?  And as per Doug, are you write buffering your
Ioan Eugen Stan 2012-02-23, 14:57
Daniel Iancu 2012-02-23, 18:36