Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - RS unresponsive after series of deletes


Copy link to this message
-
RE: RS unresponsive after series of deletes
Ted Tuttle 2012-06-21, 14:02
Good hint, Ted

By calling Delete.deleteColumn(family, qual, ts) instead of deleteColumn
w/o timestamp, the time to delete row keys is reduced by 95%.

I am going to experiment w/ limited batches of Deletes, too.

Thanks everyone for help on this one.
-----Original Message-----
From: Ted Yu [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, June 20, 2012 10:13 PM
To: [EMAIL PROTECTED]
Subject: Re: RS unresponsive after series of deletes

As I mentioned earlier, prepareDeleteTimestamps() performs one get
operation per column qualifier:
          get.addColumn(family, qual);

          List<KeyValue> result = get(get, false);
This is too costly in your case.
I think you can group some configurable number of qualifiers in each get
and perform classification on result.
This way we can reduce the number of times
HRegion$RegionScannerImpl.next()
is called.

Cheers

On Wed, Jun 20, 2012 at 9:54 PM, Ted Tuttle
<[EMAIL PROTECTED]>wrote:

> > Do your 100s of thousands cell deletes overlap (in terms of column
> family)
> > across rows ?
>
> Our schema contains only one column family per table. So, each Delete
> contains cells from a single column family.  I hope this answers your
> question.