Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - RS unresponsive after series of deletes

Copy link to this message
RE: RS unresponsive after series of deletes
Ted Tuttle 2012-06-21, 00:07
First off, J-D, thanks for helping me work through this.  You've
inspired some different angles and I think I've finally made it bleed in
a controlled way.

> - That data you are deleting needs to be read when you scan, like I
> said earlier a delete is in fact an insert in HBase and this isn't
> cleared up until a major compaction happens.

I manually compacted (via UI) the table that I deleted from.  The scan
times are still >10min.  When reading through each node's log, I see
some messages indicating the major compactions were going to be skipped.
Is it safe to say that hitting that 'Compact' button is just a
recommendation?  Is there an operation we can perform after a big delete
to guarantee that deletes get compacted away?

> Do you have scanner caching turned on? Just to be sure set
> scan.setCaching(1) and see if it makes any difference.

A bit confused here.  Under what conditions would you recommend setting
the scan caching to 1?  My read path doesn't know about whether a lot of
data was recently deleted so I can't disable it conditionally. I want
scan caching in general, I believe.

> Are you saying that you have Delete objects on which you did
> deleteColumn() 1000x? If so, look no further there's your problem.

I am calling deleteColumn() thousands of time per Delete object.

I can delete a row w/ 20k keys in ~2 sec. If I issue 10 of these (they
appear to fired off asynchronously by the client), the unresponsive RS
behavior ensues.  Here is a stack dump from a RS that is running at >90%
utilization as it processes my deletes:


Some logs around this time:


So, my takeaway is the RS don't like being slammed w/ 100s of thousands
cell deletes.  I can be more measured about these deletes going forward.
That the RSs don't handle this more gracefully sounds like a bug. At a
minimum, there appears to be a nonlinear response. What do you think?