Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - RS unresponsive after series of deletes


+
Ted Tuttle 2012-06-13, 19:09
+
Stack 2012-06-14, 17:38
+
Ted Tuttle 2012-06-14, 19:36
+
Ted Tuttle 2012-06-18, 22:08
+
Jean-Daniel Cryans 2012-06-18, 22:17
+
Ted Tuttle 2012-06-20, 20:49
+
Jean-Daniel Cryans 2012-06-20, 21:32
+
Ted Tuttle 2012-06-21, 00:07
+
Ted Yu 2012-06-21, 03:52
+
Ted Yu 2012-06-21, 04:33
+
Ted Tuttle 2012-06-21, 04:54
+
Ted Yu 2012-06-21, 05:13
Copy link to this message
-
RE: RS unresponsive after series of deletes
Ted Tuttle 2012-06-21, 14:02
Good hint, Ted

By calling Delete.deleteColumn(family, qual, ts) instead of deleteColumn
w/o timestamp, the time to delete row keys is reduced by 95%.

I am going to experiment w/ limited batches of Deletes, too.

Thanks everyone for help on this one.
-----Original Message-----
From: Ted Yu [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, June 20, 2012 10:13 PM
To: [EMAIL PROTECTED]
Subject: Re: RS unresponsive after series of deletes

As I mentioned earlier, prepareDeleteTimestamps() performs one get
operation per column qualifier:
          get.addColumn(family, qual);

          List<KeyValue> result = get(get, false);
This is too costly in your case.
I think you can group some configurable number of qualifiers in each get
and perform classification on result.
This way we can reduce the number of times
HRegion$RegionScannerImpl.next()
is called.

Cheers

On Wed, Jun 20, 2012 at 9:54 PM, Ted Tuttle
<[EMAIL PROTECTED]>wrote:

> > Do your 100s of thousands cell deletes overlap (in terms of column
> family)
> > across rows ?
>
> Our schema contains only one column family per table. So, each Delete
> contains cells from a single column family.  I hope this answers your
> question.
+
Ted Yu 2012-06-21, 17:32
+
Ted Tuttle 2012-06-21, 18:00
+
Ted Yu 2012-06-21, 18:10
+
Ted Yu 2012-06-21, 14:19
+
Ted Tuttle 2012-06-21, 18:16
+
lars hofhansl 2012-06-23, 02:35