|
|
+
Paul Mackles 2012-10-05, 18:17
+
lars hofhansl 2012-10-05, 19:39
+
Anoop Sam John 2012-10-08, 03:55
+
Paul Mackles 2012-10-08, 11:45
+
Jerry Lam 2012-10-10, 15:07
-
RE: bulk deletesAnoop Sam John 2012-10-11, 04:04
You are right Jerry..
In your use case you want to delete full rows or some cfs/columns only? Pls feel free to see the issue HBASE-6942 and give your valuable comments.. Here I am trying to delete the rows [This is our use case] -Anoop- ________________________________________ From: Jerry Lam [[EMAIL PROTECTED]] Sent: Wednesday, October 10, 2012 8:37 PM To: [EMAIL PROTECTED] Subject: Re: bulk deletes Hi guys: The bulk delete approaches described in this thread are helpful in my case as well. If I understood correctly, Paul's approach is useful for offline bulk deletes (a.k.a. mapreduce) whereas Anoop's approach is useful for online/real-time bulk deletes (a.k.a. co-processor)? Best Regards, Jerry On Mon, Oct 8, 2012 at 7:45 AM, Paul Mackles <[EMAIL PROTECTED]> wrote: > Very cool Anoop. I can definitely see how that would be useful. > > Lars - the bulk deletes do appear to work. I just wasn't sure if there was > something I might be missing since I haven't seen this documented > elsewhere. > > Coprocessors do seem a better fit for this in the long term. > > Thanks everyone. > > On 10/7/12 11:55 PM, "Anoop Sam John" <[EMAIL PROTECTED]> wrote: > > >We also done an implementation using compaction time deletes(avoid KVs). > >This works very well for us.... > >As this would delay the deletes to happen till the next major compaction, > >we are having an implementation to do the real time bulk delete. [We have > >such use case] > >Here I am using an endpoint implementation to do the scan and delete at > >the server side only. Just raised an IA for this [HBASE-6942]. I will > >post a patch based on 0.94 model there...Pls have a look.... I have > >noticed big performance improvement over the normal way of scan() + > >delete(List<Delete>) as this avoids several network calls and traffic... > > > >-Anoop- > >________________________________________ > >From: lars hofhansl [[EMAIL PROTECTED]] > >Sent: Saturday, October 06, 2012 1:09 AM > >To: [EMAIL PROTECTED] > >Subject: Re: bulk deletes > > > >Does it work? :) > > > >How did you do the deletes before?I assume you used the > >HTable.delete(List<Delete>) API? > > > >(Doesn't really help you, but) In 0.92+ you could hook up a coprocessor > >into the compactions and simply filter out any KVs you want to have > >removed. > > > > > >-- Lars > > > > > > > >________________________________ > > From: Paul Mackles <[EMAIL PROTECTED]> > >To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > >Sent: Friday, October 5, 2012 11:17 AM > >Subject: bulk deletes > > > >We need to do deletes pretty regularly and sometimes we could have > >hundreds of millions of cells to delete. TTLs won't work for us because > >we have a fair amount of bizlogic around the deletes. > > > >Given their current implemention (we are on 0.90.4), this delete process > >can take a really long time (half a day or more with 100 or so concurrent > >threads). From everything I can tell, the performance issues come down to > >each delete being an individual RPC call (even when using the batch API). > >In other words, I don't see any thrashing on hbase while this process is > >running just lots of waiting for the RPC calls to return. > > > >The alternative we came up with is to use the standard bulk load > >facilities to handle the deletes. The code turned out to be surpisingly > >simple and appears to work in the small-scale tests we have tried so far. > >Is anyone else doing deletes in this fashion? Are there drawbacks that I > >might be missing? Here is a link to the code: > > > >https://gist.github.com/3841437 > > > >Pretty simple, eh? I haven't seen much mention of this technique which is > >why I am a tad paranoid about it. > > > >Thanks, > >Paul > > +
Jerry Lam 2012-10-12, 21:41
+
Jacques 2012-10-05, 19:37
|