Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Re: Extremely long flush times


Copy link to this message
-
RE: Extremely long flush times
Lars,

Glad I could help. It was cool to see how you approached the problem and came to a solution. Thanks for being so quick addressing this!

Carlos

-----Original Message-----
From: lars hofhansl [mailto:[EMAIL PROTECTED]]
Sent: Thursday, August 16, 2012 2:40 PM
To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; lars hofhansl
Subject: Re: Extremely long flush times

This is now committed to 0.94 (i.e. will be in 0.94.2) and 0.96. The fix turned out to be pretty simple (but in an intricate part of HBase)

Thanks for program demonstrating the problem Carlos, that was extremely helpful!

-- Lars

________________________________
 From: lars hofhansl <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]>
Sent: Sunday, August 12, 2012 2:41 PM
Subject: Re: Extremely long flush times
 
I filed HBASE-6561 for this (Jira is back).

----- Original Message -----
From: lars hofhansl <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Cc:
Sent: Saturday, August 11, 2012 12:42 AM
Subject: Re: Extremely long flush times

A possible solution is to have the MemStoreScanner reseek eagerly (i.e. just iterate forward) for a bit (say 100 KVs or so).If that is not fruitful then issue the expensive reseek. I'll try that tomorrow.

(In this case the tailset created from the reseek often 300.000 or more entries in it. That is not necessarily a problem since it is not recreated.)

-- Lars
----- Original Message -----
From: lars hofhansl <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; hbase-dev <[EMAIL PROTECTED]>
Cc:
Sent: Saturday, August 11, 2012 12:26 AM
Subject: Re: Extremely long flush times

It turns out the problem is not the loop in MemStoreScanner, but excessive calls to StoreScanner.reseek.

As a test I changed ScanWildcardColumnTracker.checkVersion to MatchCode.SKIP instead of MatchCode.SEEK_NEXT_COL (when the max number of versions is reached); and I do not see this behavior (I see the loop that would not go past 15 or so, happily go on until I stop the client).

Not sure what the conclusion would be. Seeking the memstore seems to be expensive, so it should only be done when many KV can be skipped with a seek, otherwise we should just iterate.
It is not clear how to find this out ahead of time.

I'm open to suggestions.

-- Lars

----- Original Message -----
From: lars hofhansl <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; hbase-dev <[EMAIL PROTECTED]>
Cc:
Sent: Friday, August 10, 2012 11:43 PM
Subject: Re: Extremely long flush times

Ran your test code (thanks Carlos).

Found two things:
1. Store.internalFlushCache(...) should be calling StoreScanner.next(List<KeyValue>, int limit) - currently it does not set a limit.(But this is not the problem).

2. With jstack I found that the code is stuck in a loop in Memstore.MemstoreScanner.getNext(...)

Here's the relevant part of the jstack:
"IPC Server handler 6 on 60020" daemon prio=10 tid=0x00007f0574625000 nid=0x720c runnable [0x00007f05669e7000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.getNext(MemStore.java:726)
        at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.seekInSubLists(MemStore.java:761)
        - locked <0x00000000c4a8a860> (a org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner)
        at org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner.reseek(MemStore.java:800)
        - locked <0x00000000c4a8a860> (a org.apache.hadoop.hbase.regionserver.MemStore$MemStoreScanner)
        at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54)
        at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:299)
        at org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:244)
        at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:522)
        - eliminated <0x00000000ccb54860> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
        at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:403)
        - locked <0x00000000ccb54860> (a org.apache.hadoop.hbase.regionserver.StoreScanner)
        at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:127)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3459)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406)
        - locked <0x00000000c59ee610> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423)
        - locked <0x00000000c59ee610> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4171)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4144)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1958)
        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1389)

At the same time I find that flush cannot finish:

"regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007f05749ab000 nid=0x71fe waiting for monitor entry [0x00007f05677f6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB