|
Ted Tuttle
2012-06-13, 19:09
Stack
2012-06-14, 17:38
Ted Tuttle
2012-06-14, 19:36
Ted Tuttle
2012-06-18, 22:08
Jean-Daniel Cryans
2012-06-18, 22:17
Ted Tuttle
2012-06-20, 20:49
Jean-Daniel Cryans
2012-06-20, 21:32
Ted Tuttle
2012-06-21, 00:07
Ted Yu
2012-06-21, 03:52
Ted Yu
2012-06-21, 04:33
Ted Tuttle
2012-06-21, 04:54
Ted Yu
2012-06-21, 05:13
Ted Tuttle
2012-06-21, 14:02
Ted Yu
2012-06-21, 14:19
Ted Yu
2012-06-21, 17:32
Ted Tuttle
2012-06-21, 18:00
Ted Yu
2012-06-21, 18:10
Ted Tuttle
2012-06-21, 18:16
lars hofhansl
2012-06-23, 02:35
|
-
RS unresponsive after series of deletesTed Tuttle 2012-06-13, 19:09
Hi All-
I have a repeatable and troublesome HBase interaction that I would like some advice on. I am running a 5 node cluster on v0.94 on cdh3u3 and accessing through Java client API. Each RS has 32G of RAM, is running w/ 16G heap w/ 4G for block cache. Used heap of each RS is well below 16G available. My client code has a set of deletes to carry out. After successfully issuing 19 such deletes the client begins logging HBase errors while trying to complete the deletes. It logs ERRORs every 60s for 10 times and then gives up. I estimate that the client successfully deleted about 270MB of data in the first 19 deletes. Each batch delete covering about 144 rows with a row size of about 100KB. Here is first of 10 ERRORs logged in client: http://pastebin.com/QMJsbgkZ. Client errors are 1 per minute between 00:22:48 and 00:32:58 with final error being: http://pastebin.com/ajaVxYUZ Ultimately, the RS became responsive again. Looking at monitoring I see spike in CPU utilization on node that is unresponsive; it goes from 2% utilization to 20% and sticks there for a few minutes. None of the other nodes in the cluster appear busy at this time. Logs from unresponsive RS are here: http://pastebin.com/z9qxGuJS There are no ERRORs in the log around the time of the unresponsiveness. It appears from the server log that the "responseTooSlow" operation completed about 7min after the client gave up. So, any ideas what was making the RS unresponsive? Did it really take 17min to delete 280MB of data? I can easily change client RPC timeouts and number of retries, but I feel there is some I am missing. Any suggestions? Thanks, Ted
-
Re: RS unresponsive after series of deletesStack 2012-06-14, 17:38
On Wed, Jun 13, 2012 at 12:09 PM, Ted Tuttle
<[EMAIL PROTECTED]> wrote: > My client code has a set of deletes to carry out. After successfully issuing 19 such deletes the client begins logging HBase errors while trying to complete the deletes. It logs ERRORs every 60s for 10 times and then gives up. > What kind of a delete are you doing? You are deleting individual cells? When you say 19 deletes, each of these is a batch delete? If a cell delete, we need to read the cell first to find the most recent timestamp. Looks like we are timing out the rpc doing your batch of deletes. Could it be that a batch is doing a bunch at the one time and taking a long time to complete? Try making smaller batches? (Delete of 144 rows taking a minute seems like way too long though, or is the delete of a row made up of many individual deletes? A delete of a column family on a row is cheaper than cell delete because just puts a marker on the column family -- See http://hbase.apache.org/book.html#version.delete). > Ultimately, the RS became responsive again. Looking at monitoring I see spike in CPU utilization on node that is unresponsive; it goes from 2% utilization to 20% and sticks there for a few minutes. None of the other nodes in the cluster appear busy at this time. > Want to try thread dumping it when it goes unresponsive? That'd help us figure what the regionserver was doing at the time when its burning 20% (Do you have gc logging enabled? Anything in the .out file at this time when we are using CPU?) St.Ack
-
RE: RS unresponsive after series of deletesTed Tuttle 2012-06-14, 19:36
> What kind of a delete are you doing?
A mixture of row and cell deletes. Interestingly, the first 19 (successful) deletes were row deletes. The client got hung up while submitting its first batch of cell deletes. However, I think the cell/row distinction is a red herring as we've experienced this behavior at least once with batches of exclusively row deletes. > When you say 19 deletes, each of these is a batch delete? Each of the 19 deletes is a call to HTable.delete(List<Delete>). I estimated there where about 144 Deletes in each batch. In the cell delete that failed, I estimate about 1000 column qualifiers per row for a total of about 144k cells per batch. > Could it be that a batch is doing a bunch at the one time and taking a long time to complete? In order to issue the cell delete we scan each row's column keys for matches to in-memory set of domain objects. The code to construct the delete is completing quickly. I should add that most of our deletes are very fast. But on 3 occasions thus far, they exceed 10min allotted by retry logic in client. > Try making smaller batches? Want to try thread dumping it when it goes unresponsive? I will try to reproduce w/ test harness. > Do you have gc logging enabled? Anything in the .out file at this time when we are using CPU? I don't see any GC related operations over 10s. Here is log from time of first failure to 20min after: http://pastebin.com/AUaULHcD -Ted
-
RE: RS unresponsive after series of deletesTed Tuttle 2012-06-18, 22:08
We had another of these delete-related RS hang ups. This time we are
getting a different error on the client: java.io.IOException: Call queue is full, is ipc.server.max.callqueue.size too small? full stack here: http://pastebin.com/uq68Mvhm Looking at the RS log, it appears the RS was working on the batch delete for about 1hr. There are no errors in the RS log during this time. There are several "responseTooSlow" messages. Based on processingtimems values they all lead back to our big batch delete. Any theories on how a big batch of deletes could cause a RS to go unresponsive?
-
Re: RS unresponsive after series of deletesJean-Daniel Cryans 2012-06-18, 22:17
Mass deleting in HBase is equivalent to mass inserting, it's just that
the former doesn't have to write values out (just keys). Almost everything that applies to batch insert tunings applies to batch deleting. Now the error you get comes from this: https://issues.apache.org/jira/browse/HBASE-5190 What it means is that you have 1GB worth of _deletes_ sitting the region server call queue. That's way too much, something's wrong, and it doesn't seem to be making progress. Like Stack said in his reply, have you thread dumped the slow region servers when this happens? It would also help to see the log during that time. Try to capture a good chunk of it and post it like you did on pastebin. Thx, J-D On Mon, Jun 18, 2012 at 3:08 PM, Ted Tuttle <[EMAIL PROTECTED]> wrote: > We had another of these delete-related RS hang ups. This time we are > getting a different error on the client: > > java.io.IOException: Call queue is full, is > ipc.server.max.callqueue.size too small? > > full stack here: http://pastebin.com/uq68Mvhm > > Looking at the RS log, it appears the RS was working on the batch delete > for about 1hr. There are no errors in the RS log during this time. > There are several "responseTooSlow" messages. Based on processingtimems > values they all lead back to our big batch delete. > > Any theories on how a big batch of deletes could cause a RS to go > unresponsive? > > >
-
RE: RS unresponsive after series of deletesTed Tuttle 2012-06-20, 20:49
> Like Stack said in his reply, have you thread dumped the slow region
> servers when this happens? I've been having difficulty reproducing this behavior in controlled manner. While I haven't been able to get my client to hang up while doing deletes, I have found a query that when issued after a big delete takes a very long (>10m) time. The timeline for this is: 1) insert a bunch of data 2) delete it all in 500 calls to HTable.delete(List<Delete>) 3) scan table for data that was just deleted (500 scans with various row start/end, where scan bails as soon as first key of first row is found for a particular row start/end pair) The last part is very fast on undeleted data. For my recently deleted data this takes ~15min. When I look at RS CPU it is spiking like my unresponsive episodes in the "wild". Looking at busy RS I see: Thread 50 (IPC Server handler 1 on 60020): State: RUNNABLE Blocked count: 384389 Waited count: 12192804 Stack: org.apache.hadoop.hbase.regionserver.StoreFileScanner.peek(StoreFileScan ner.java:121) org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyVal ueHeap.java:282) org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.ja va:244) org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.ja va:521) org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java :402) org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java :127) org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInter nal(HRegion.java:3354) org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HReg ion.java:3310) org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HReg ion.java:3327) org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.ja va:2393) sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor Impl.java:25) java.lang.reflect.Method.invoke(Method.java:597) org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEng ine.java:364) org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:137 6) And lots of block cache churn in the logs: 2012-06-20 13:13:55,572 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction started; Attempting to free 409.4 MB of total=3.4 GB > It would also help to see the log during > that time More of logs here: http://pastebin.com/4annviTS
-
Re: RS unresponsive after series of deletesJean-Daniel Cryans 2012-06-20, 21:32
What you are describing here seems very different from what shown earlier.
In any case, a few remarks: - You have major compactions running during the time of that log trace, this usually sucks up a lot of IO. See http://hbase.apache.org/book.html#managed.compactions - That data you are deleting needs to be read when you scan, like I said earlier a delete is in fact an insert in HBase and this isn't cleared up until a major compaction happens. This is a lot of dead data to cache during a scan :) - Do you have scanner caching turned on? Just to be sure set scan.setCaching(1) and see if it makes any difference. Also I was re-reading your previous answers and I'd like more info about this: > I estimate about 1000 column qualifiers per row for > a total of about 144k cells per batch. Are you saying that you have Delete objects on which you did deleteColumn() 1000x? If so, look no further there's your problem. It would also explain why in your previous log we don't see any blocking but just calls that take more than 5 minutes to run. Like Stack mentioned, a cell delete requires reading the old cell first. Let's say that takes on average 1.5ms. Then doing the insert takes another 1.5ms. The total time to process a 144k cell batch would be: 144000 * 1.5 * 1.5 / 1000 = 324 seconds This is not very far from the numbers I'm seeing in your original log: http://pastebin.com/z9qxGuJS "processingtimems":314804 "processingtimems":356151 "processingtimems":398570 etc It seems you need to review how you are using HBase because right now unless you aggressively major compact your tables you won't get good performance with all those delete markers. J-D On Wed, Jun 20, 2012 at 1:49 PM, Ted Tuttle <[EMAIL PROTECTED]> wrote: >> Like Stack said in his reply, have you thread dumped the slow region >> servers when this happens? > > I've been having difficulty reproducing this behavior in controlled > manner. While I haven't been able to get my client to hang up while > doing deletes, I have found a query that when issued after a big delete > takes a very long (>10m) time. > > The timeline for this is: > > 1) insert a bunch of data > 2) delete it all in 500 calls to HTable.delete(List<Delete>) > 3) scan table for data that was just deleted (500 scans with various row > start/end, where scan bails as soon as first key of first row is found > for a particular row start/end pair) > > The last part is very fast on undeleted data. For my recently deleted > data this takes ~15min. When I look at RS CPU it is spiking like my > unresponsive episodes in the "wild". > > Looking at busy RS I see: > > Thread 50 (IPC Server handler 1 on 60020): > State: RUNNABLE > Blocked count: 384389 > Waited count: 12192804 > Stack: > > org.apache.hadoop.hbase.regionserver.StoreFileScanner.peek(StoreFileScan > ner.java:121) > > org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyVal > ueHeap.java:282) > > org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.ja > va:244) > > org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.ja > va:521) > > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java > :402) > > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java > :127) > > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInter > nal(HRegion.java:3354) > > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HReg > ion.java:3310) > > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HReg > ion.java:3327) > > org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.ja > va:2393) > sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor > Impl.java:25) > java.lang.reflect.Method.invoke(Method.java:597) > > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEng > ine.java:364) > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:137
-
RE: RS unresponsive after series of deletesTed Tuttle 2012-06-21, 00:07
First off, J-D, thanks for helping me work through this. You've
inspired some different angles and I think I've finally made it bleed in a controlled way. > - That data you are deleting needs to be read when you scan, like I > said earlier a delete is in fact an insert in HBase and this isn't > cleared up until a major compaction happens. I manually compacted (via UI) the table that I deleted from. The scan times are still >10min. When reading through each node's log, I see some messages indicating the major compactions were going to be skipped. Is it safe to say that hitting that 'Compact' button is just a recommendation? Is there an operation we can perform after a big delete to guarantee that deletes get compacted away? > Do you have scanner caching turned on? Just to be sure set > scan.setCaching(1) and see if it makes any difference. A bit confused here. Under what conditions would you recommend setting the scan caching to 1? My read path doesn't know about whether a lot of data was recently deleted so I can't disable it conditionally. I want scan caching in general, I believe. > Are you saying that you have Delete objects on which you did > deleteColumn() 1000x? If so, look no further there's your problem. I am calling deleteColumn() thousands of time per Delete object. I can delete a row w/ 20k keys in ~2 sec. If I issue 10 of these (they appear to fired off asynchronously by the client), the unresponsive RS behavior ensues. Here is a stack dump from a RS that is running at >90% utilization as it processes my deletes: http://pastebin.com/8y5x4xU7 Some logs around this time: http://pastebin.com/UpPMbsmn So, my takeaway is the RS don't like being slammed w/ 100s of thousands cell deletes. I can be more measured about these deletes going forward. That the RSs don't handle this more gracefully sounds like a bug. At a minimum, there appears to be a nonlinear response. What do you think?
-
Re: RS unresponsive after series of deletesTed Yu 2012-06-21, 03:52
Looking at the stack trace, I found the following hot spot:
1. org.apache.hadoop.hbase.regionserver.StoreFileScanner.realSeekDone(StoreFileScanner.java:340) 2. org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:331) 3. org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:105) 4. org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:406) 5. org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127) 6. org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3354) 7. org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3310) 8. org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3327) 9. org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4066) 10. org.apache.hadoop.hbase.regionserver.HRegion.prepareDeleteTimestamps(HRegion.java:1710) 11. org.apache.hadoop.hbase.regionserver.HRegion.internalDelete(HRegion.java:1753 >From HRegion: for (KeyValue kv: kvs) { // Check if time is LATEST, change to time of most recent addition if so // This is expensive. if (kv.isLatestTimestamp() && kv.isDeleteType()) { ... List<KeyValue> result = get(get, false); We perform get() for each kv whose time is LATEST. This explains the unresponsiveness. FYI On Wed, Jun 20, 2012 at 5:07 PM, Ted Tuttle <[EMAIL PROTECTED]>wrote: > First off, J-D, thanks for helping me work through this. You've > inspired some different angles and I think I've finally made it bleed in > a controlled way. > > > - That data you are deleting needs to be read when you scan, like I > > said earlier a delete is in fact an insert in HBase and this isn't > > cleared up until a major compaction happens. > > I manually compacted (via UI) the table that I deleted from. The scan > times are still >10min. When reading through each node's log, I see > some messages indicating the major compactions were going to be skipped. > Is it safe to say that hitting that 'Compact' button is just a > recommendation? Is there an operation we can perform after a big delete > to guarantee that deletes get compacted away? > > > Do you have scanner caching turned on? Just to be sure set > > scan.setCaching(1) and see if it makes any difference. > > A bit confused here. Under what conditions would you recommend setting > the scan caching to 1? My read path doesn't know about whether a lot of > data was recently deleted so I can't disable it conditionally. I want > scan caching in general, I believe. > > > Are you saying that you have Delete objects on which you did > > deleteColumn() 1000x? If so, look no further there's your problem. > > I am calling deleteColumn() thousands of time per Delete object. > > I can delete a row w/ 20k keys in ~2 sec. If I issue 10 of these (they > appear to fired off asynchronously by the client), the unresponsive RS > behavior ensues. Here is a stack dump from a RS that is running at >90% > utilization as it processes my deletes: > > http://pastebin.com/8y5x4xU7 > > Some logs around this time: > > http://pastebin.com/UpPMbsmn > > So, my takeaway is the RS don't like being slammed w/ 100s of thousands > cell deletes. I can be more measured about these deletes going forward. > That the RSs don't handle this more gracefully sounds like a bug. At a > minimum, there appears to be a nonlinear response. What do you think? > > > >
-
Re: RS unresponsive after series of deletesTed Yu 2012-06-21, 04:33
Ted T:
Do your 100s of thousands cell deletes overlap (in terms of column family) across rows ? In HRegionServer: public <R> MultiResponse multi(MultiAction<R> multi) throws IOException { ... for (Action<R> a : actionsForRegion) { action = a.getAction(); ... if (action instanceof Delete) { delete(regionName, (Delete) action); I think if we group the deletes of actionsForRegion, we can utilize the following: public int delete(final byte[] regionName, final List<Delete> deletes) Inside HRegion, we should be able to reduce the number of times HRegion$RegionScannerImpl.next() is called. Cheers On Wed, Jun 20, 2012 at 8:52 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > Looking at the stack trace, I found the following hot spot: > > 1. > org.apache.hadoop.hbase.regionserver.StoreFileScanner.realSeekDone(StoreFileScanner.java:340) > 2. > org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:331) > 3. > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:105) > 4. > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:406) > 5. > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127) > 6. > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3354) > 7. > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3310) > 8. > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3327) > 9. > org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4066) > 10. > org.apache.hadoop.hbase.regionserver.HRegion.prepareDeleteTimestamps(HRegion.java:1710) > 11. > org.apache.hadoop.hbase.regionserver.HRegion.internalDelete(HRegion.java:1753 > > From HRegion: > > for (KeyValue kv: kvs) { > // Check if time is LATEST, change to time of most recent > addition if so > // This is expensive. > if (kv.isLatestTimestamp() && kv.isDeleteType()) { > ... > List<KeyValue> result = get(get, false); > > We perform get() for each kv whose time is LATEST. > This explains the unresponsiveness. > > FYI > > > On Wed, Jun 20, 2012 at 5:07 PM, Ted Tuttle <[EMAIL PROTECTED]>wrote: > >> First off, J-D, thanks for helping me work through this. You've >> inspired some different angles and I think I've finally made it bleed in >> a controlled way. >> >> > - That data you are deleting needs to be read when you scan, like I >> > said earlier a delete is in fact an insert in HBase and this isn't >> > cleared up until a major compaction happens. >> >> I manually compacted (via UI) the table that I deleted from. The scan >> times are still >10min. When reading through each node's log, I see >> some messages indicating the major compactions were going to be skipped. >> Is it safe to say that hitting that 'Compact' button is just a >> recommendation? Is there an operation we can perform after a big delete >> to guarantee that deletes get compacted away? >> >> > Do you have scanner caching turned on? Just to be sure set >> > scan.setCaching(1) and see if it makes any difference. >> >> A bit confused here. Under what conditions would you recommend setting >> the scan caching to 1? My read path doesn't know about whether a lot of >> data was recently deleted so I can't disable it conditionally. I want >> scan caching in general, I believe. >> >> > Are you saying that you have Delete objects on which you did >> > deleteColumn() 1000x? If so, look no further there's your problem. >> >> I am calling deleteColumn() thousands of time per Delete object. >> >> I can delete a row w/ 20k keys in ~2 sec. If I issue 10 of these (they >> appear to fired off asynchronously by the client), the unresponsive RS >> behavior ensues. Here is a stack dump from a RS that is running at >90% >> utilization as it processes my deletes: >> >> http://pastebin.com/8y5x4xU7
-
Re: RS unresponsive after series of deletesTed Tuttle 2012-06-21, 04:54
> Do your 100s of thousands cell deletes overlap (in terms of column family)
> across rows ? Our schema contains only one column family per table. So, each Delete contains cells from a single column family. I hope this answers your question.
-
Re: RS unresponsive after series of deletesTed Yu 2012-06-21, 05:13
As I mentioned earlier, prepareDeleteTimestamps() performs one get
operation per column qualifier: get.addColumn(family, qual); List<KeyValue> result = get(get, false); This is too costly in your case. I think you can group some configurable number of qualifiers in each get and perform classification on result. This way we can reduce the number of times HRegion$RegionScannerImpl.next() is called. Cheers On Wed, Jun 20, 2012 at 9:54 PM, Ted Tuttle <[EMAIL PROTECTED]>wrote: > > Do your 100s of thousands cell deletes overlap (in terms of column > family) > > across rows ? > > Our schema contains only one column family per table. So, each Delete > contains cells from a single column family. I hope this answers your > question.
-
RE: RS unresponsive after series of deletesTed Tuttle 2012-06-21, 14:02
Good hint, Ted
By calling Delete.deleteColumn(family, qual, ts) instead of deleteColumn w/o timestamp, the time to delete row keys is reduced by 95%. I am going to experiment w/ limited batches of Deletes, too. Thanks everyone for help on this one. -----Original Message----- From: Ted Yu [mailto:[EMAIL PROTECTED]] Sent: Wednesday, June 20, 2012 10:13 PM To: [EMAIL PROTECTED] Subject: Re: RS unresponsive after series of deletes As I mentioned earlier, prepareDeleteTimestamps() performs one get operation per column qualifier: get.addColumn(family, qual); List<KeyValue> result = get(get, false); This is too costly in your case. I think you can group some configurable number of qualifiers in each get and perform classification on result. This way we can reduce the number of times HRegion$RegionScannerImpl.next() is called. Cheers On Wed, Jun 20, 2012 at 9:54 PM, Ted Tuttle <[EMAIL PROTECTED]>wrote: > > Do your 100s of thousands cell deletes overlap (in terms of column > family) > > across rows ? > > Our schema contains only one column family per table. So, each Delete > contains cells from a single column family. I hope this answers your > question.
-
Re: RS unresponsive after series of deletesTed Yu 2012-06-21, 14:19
Ted T:
Can you log a JIRA summarizing the issue ? I feel HBase should provide better handling for cell deletion of very wide rows intrinsically - without user tweaking timestamp. On Thu, Jun 21, 2012 at 7:02 AM, Ted Tuttle <[EMAIL PROTECTED]>wrote: > Good hint, Ted > > By calling Delete.deleteColumn(family, qual, ts) instead of deleteColumn > w/o timestamp, the time to delete row keys is reduced by 95%. > > I am going to experiment w/ limited batches of Deletes, too. > > Thanks everyone for help on this one. > > > -----Original Message----- > From: Ted Yu [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, June 20, 2012 10:13 PM > To: [EMAIL PROTECTED] > Subject: Re: RS unresponsive after series of deletes > > As I mentioned earlier, prepareDeleteTimestamps() performs one get > operation per column qualifier: > get.addColumn(family, qual); > > List<KeyValue> result = get(get, false); > This is too costly in your case. > I think you can group some configurable number of qualifiers in each get > and perform classification on result. > This way we can reduce the number of times > HRegion$RegionScannerImpl.next() > is called. > > Cheers > > On Wed, Jun 20, 2012 at 9:54 PM, Ted Tuttle > <[EMAIL PROTECTED]>wrote: > > > > Do your 100s of thousands cell deletes overlap (in terms of column > > family) > > > across rows ? > > > > Our schema contains only one column family per table. So, each Delete > > contains cells from a single column family. I hope this answers your > > question. >
-
Re: RS unresponsive after series of deletesTed Yu 2012-06-21, 17:32
Ted:
Can you share what ts value was passed to Delete.deleteColumn(family, qual, ts) ? Potentially, an insertion for the same (family, qual) immediately following the delete call may be masked by the above. Cheers On Thu, Jun 21, 2012 at 7:02 AM, Ted Tuttle <[EMAIL PROTECTED]>wrote: > Good hint, Ted > > By calling Delete.deleteColumn(family, qual, ts) instead of deleteColumn > w/o timestamp, the time to delete row keys is reduced by 95%. > > I am going to experiment w/ limited batches of Deletes, too. > > Thanks everyone for help on this one. > > > -----Original Message----- > From: Ted Yu [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, June 20, 2012 10:13 PM > To: [EMAIL PROTECTED] > Subject: Re: RS unresponsive after series of deletes > > As I mentioned earlier, prepareDeleteTimestamps() performs one get > operation per column qualifier: > get.addColumn(family, qual); > > List<KeyValue> result = get(get, false); > This is too costly in your case. > I think you can group some configurable number of qualifiers in each get > and perform classification on result. > This way we can reduce the number of times > HRegion$RegionScannerImpl.next() > is called. > > Cheers > > On Wed, Jun 20, 2012 at 9:54 PM, Ted Tuttle > <[EMAIL PROTECTED]>wrote: > > > > Do your 100s of thousands cell deletes overlap (in terms of column > > family) > > > across rows ? > > > > Our schema contains only one column family per table. So, each Delete > > contains cells from a single column family. I hope this answers your > > question. >
-
RE: RS unresponsive after series of deletesTed Tuttle 2012-06-21, 18:00
Working on the JIRA ticket now, btw.
> Ted: > Can you share what ts value was passed to Delete.deleteColumn(family, qual, ts) ? > Potentially, an insertion for the same (family, qual) immediately following the delete call may be masked by the above. We scan for KeyValues matching rows and columns matching client's domain objects. For each KeyValue for a given row we call long ts = kv.getTimestamp() delete.deleteColumn(fam, qual, ts) From: Ted Yu [mailto:[EMAIL PROTECTED]] Sent: Thursday, June 21, 2012 10:32 AM To: [EMAIL PROTECTED] Cc: Development Subject: Re: RS unresponsive after series of deletes Cheers On Thu, Jun 21, 2012 at 7:02 AM, Ted Tuttle <[EMAIL PROTECTED]> wrote: Good hint, Ted By calling Delete.deleteColumn(family, qual, ts) instead of deleteColumn w/o timestamp, the time to delete row keys is reduced by 95%. I am going to experiment w/ limited batches of Deletes, too. Thanks everyone for help on this one. -----Original Message----- From: Ted Yu [mailto:[EMAIL PROTECTED]] Sent: Wednesday, June 20, 2012 10:13 PM To: [EMAIL PROTECTED] Subject: Re: RS unresponsive after series of deletes As I mentioned earlier, prepareDeleteTimestamps() performs one get operation per column qualifier: get.addColumn(family, qual); List<KeyValue> result = get(get, false); This is too costly in your case. I think you can group some configurable number of qualifiers in each get and perform classification on result. This way we can reduce the number of times HRegion$RegionScannerImpl.next() is called. Cheers On Wed, Jun 20, 2012 at 9:54 PM, Ted Tuttle <[EMAIL PROTECTED]>wrote: > > Do your 100s of thousands cell deletes overlap (in terms of column > family) > > across rows ? > > Our schema contains only one column family per table. So, each Delete > contains cells from a single column family. I hope this answers your > question.
-
Re: RS unresponsive after series of deletesTed Yu 2012-06-21, 18:10
Doug:
Can you enhance related part in the book w.r.t. usage of Delete.deleteColumn(family, qual) ? Basically we should warn users of potentially long process time if there're many columns involved. Thanks On Thu, Jun 21, 2012 at 11:00 AM, Ted Tuttle <[EMAIL PROTECTED]>wrote: > Working on the JIRA ticket now, btw.**** > > ** ** > > > Ted: > > Can you share what ts value was passed to Delete.deleteColumn(family, > qual, ts) ? > > Potentially, an insertion for the same (family, qual) immediately > following the delete call may be masked by the above.**** > > ** ** > > We scan for KeyValues matching rows and columns matching client's domain > objects. For each KeyValue for a given row we call**** > > ** ** > > long ts = kv.getTimestamp()**** > > delete.deleteColumn(fam, qual, ts) **** > > ** ** > > *From:* Ted Yu [mailto:[EMAIL PROTECTED]] > *Sent:* Thursday, June 21, 2012 10:32 AM > *To:* [EMAIL PROTECTED] > *Cc:* Development > > *Subject:* Re: RS unresponsive after series of deletes**** > > ** ** > > > > Cheers**** > > On Thu, Jun 21, 2012 at 7:02 AM, Ted Tuttle <[EMAIL PROTECTED]> > wrote:**** > > Good hint, Ted > > By calling Delete.deleteColumn(family, qual, ts) instead of deleteColumn > w/o timestamp, the time to delete row keys is reduced by 95%. > > I am going to experiment w/ limited batches of Deletes, too. > > Thanks everyone for help on this one.**** > > > > -----Original Message----- > From: Ted Yu [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, June 20, 2012 10:13 PM > To: [EMAIL PROTECTED] > Subject: Re: RS unresponsive after series of deletes > > As I mentioned earlier, prepareDeleteTimestamps() performs one get > operation per column qualifier: > get.addColumn(family, qual); > > List<KeyValue> result = get(get, false); > This is too costly in your case. > I think you can group some configurable number of qualifiers in each get > and perform classification on result. > This way we can reduce the number of times > HRegion$RegionScannerImpl.next() > is called. > > Cheers > > On Wed, Jun 20, 2012 at 9:54 PM, Ted Tuttle > <[EMAIL PROTECTED]>wrote: > > > > Do your 100s of thousands cell deletes overlap (in terms of column > > family) > > > across rows ? > > > > Our schema contains only one column family per table. So, each Delete > > contains cells from a single column family. I hope this answers your > > question.**** > > ** ** >
-
RE: RS unresponsive after series of deletesTed Tuttle 2012-06-21, 18:16
-
Re: RS unresponsive after series of deleteslars hofhansl 2012-06-23, 02:35
Sorry for chiming in late.
Are you sure you want to use Delete.deleteColumn and not Delete.deleteColumns (note the plural form). deleteColumn marks a single version of a column (of a CF of a Row) for deletion deleteColumns marks all versions of a column as deleted (unless you specify a timestamp). deleteColumns is what you want in most cases unless you carefully have to control individual version of a specific column in a specific row. -- Lars ________________________________ From: Ted Tuttle <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Development <[EMAIL PROTECTED]> Sent: Thursday, June 21, 2012 7:02 AM Subject: RE: RS unresponsive after series of deletes Good hint, Ted By calling Delete.deleteColumn(family, qual, ts) instead of deleteColumn w/o timestamp, the time to delete row keys is reduced by 95%. I am going to experiment w/ limited batches of Deletes, too. Thanks everyone for help on this one. -----Original Message----- From: Ted Yu [mailto:[EMAIL PROTECTED]] Sent: Wednesday, June 20, 2012 10:13 PM To: [EMAIL PROTECTED] Subject: Re: RS unresponsive after series of deletes As I mentioned earlier, prepareDeleteTimestamps() performs one get operation per column qualifier: get.addColumn(family, qual); List<KeyValue> result = get(get, false); This is too costly in your case. I think you can group some configurable number of qualifiers in each get and perform classification on result. This way we can reduce the number of times HRegion$RegionScannerImpl.next() is called. Cheers On Wed, Jun 20, 2012 at 9:54 PM, Ted Tuttle <[EMAIL PROTECTED]>wrote: > > Do your 100s of thousands cell deletes overlap (in terms of column > family) > > across rows ? > > Our schema contains only one column family per table. So, each Delete > contains cells from a single column family. I hope this answers your > question. |