Thanks for the suggestion. I did use List<Delete> with size 1000, actually the performance was not that different from deleting one row at a time.
I investigated HRegion.delete() method, my understanding is that when you call delete() to delete a row, it's actually going to delete all the column families for that row first, meaning it'll put tombstone to each family column.
In my case each row has 5 family columns, that means each delete will result in putting 5 tombstones to the row, I am thinking that could be the reason why delete is so slow.
I am just wondering if there's anyway or tools we can profile a hbase application to measure the time taken on each individual methods.
From: Doug Meil [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, February 21, 2012 8:54 PM
To: [EMAIL PROTECTED]
Subject: Re: hbase delete operation is very slow
I don't think write-buffering is an option because that's Put-only the last time I looked, but the advice I put in the book is to use the delete(List<Delete>). He'll have to keep track of the List<Delete> himself and determine when the batch should be sent, but it's a lot better than one at a time.
On 2/21/12 7:39 PM, "Stack" <[EMAIL PROTECTED]> wrote:
>On Tue, Feb 21, 2012 at 2:45 PM, Doug Meil
><[EMAIL PROTECTED]> wrote:
>> Hi there-
>> You probably want to see this...
>> .. that particular method doesn't use the write-buffer and is
>> submitting deletes one-by-one to the RS's.
>Do what Doug suggests. Sounds like you are setting up a Map per row
>and then per row, figuring whether to Delete. If a Delete, you do an
>invocation per. Where are you getting your table instance from? Is it
>created each time? And as per Doug, are you write buffering your