Paul Mackles 2012-10-05, 18:17
lars hofhansl 2012-10-05, 19:39
Anoop Sam John 2012-10-08, 03:55
-Re: bulk deletes
Paul Mackles 2012-10-08, 11:45
Very cool Anoop. I can definitely see how that would be useful.
Lars - the bulk deletes do appear to work. I just wasn't sure if there was
something I might be missing since I haven't seen this documented
Coprocessors do seem a better fit for this in the long term.
On 10/7/12 11:55 PM, "Anoop Sam John" <[EMAIL PROTECTED]> wrote:
>We also done an implementation using compaction time deletes(avoid KVs).
>This works very well for us....
>As this would delay the deletes to happen till the next major compaction,
>we are having an implementation to do the real time bulk delete. [We have
>such use case]
>Here I am using an endpoint implementation to do the scan and delete at
>the server side only. Just raised an IA for this [HBASE-6942]. I will
>post a patch based on 0.94 model there...Pls have a look.... I have
>noticed big performance improvement over the normal way of scan() +
>delete(List<Delete>) as this avoids several network calls and traffic...
>From: lars hofhansl [[EMAIL PROTECTED]]
>Sent: Saturday, October 06, 2012 1:09 AM
>To: [EMAIL PROTECTED]
>Subject: Re: bulk deletes
>Does it work? :)
>How did you do the deletes before?I assume you used the
>(Doesn't really help you, but) In 0.92+ you could hook up a coprocessor
>into the compactions and simply filter out any KVs you want to have
> From: Paul Mackles <[EMAIL PROTECTED]>
>To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>Sent: Friday, October 5, 2012 11:17 AM
>Subject: bulk deletes
>We need to do deletes pretty regularly and sometimes we could have
>hundreds of millions of cells to delete. TTLs won't work for us because
>we have a fair amount of bizlogic around the deletes.
>Given their current implemention (we are on 0.90.4), this delete process
>can take a really long time (half a day or more with 100 or so concurrent
>threads). From everything I can tell, the performance issues come down to
>each delete being an individual RPC call (even when using the batch API).
>In other words, I don't see any thrashing on hbase while this process is
>running just lots of waiting for the RPC calls to return.
>The alternative we came up with is to use the standard bulk load
>facilities to handle the deletes. The code turned out to be surpisingly
>simple and appears to work in the small-scale tests we have tried so far.
>Is anyone else doing deletes in this fashion? Are there drawbacks that I
>might be missing? Here is a link to the code:
>Pretty simple, eh? I haven't seen much mention of this technique which is
>why I am a tad paranoid about it.
Jerry Lam 2012-10-10, 15:07
Anoop Sam John 2012-10-11, 04:04
Jerry Lam 2012-10-12, 21:41
Jacques 2012-10-05, 19:37