I've been meaning to look into something regarding compactions for a while
now that may be relevant here. It could be that this is already how it
works, but just to be sure I'll spell out my suspicions...
I did a lot of large uploads when we moved to .92. Our biggest dataset is
time series data (partitioned 16 ways with a row prefix). The actual
inserting and flushing went extremely quickly, and the parallel compactions
were churning away. However, when the compactions inevitably started
falling behind I noticed a potential problem. The compaction queue would
get up to, say, 40, which represented, say, an hour's worth of requests.
The problem was that by the time a compaction request started executing,
the CompactionSelection that it held was terribly out of date. It was
compacting a small selection (3-5) of the 50 files that were now there.
Then the next request would compact another (3-5), etc, etc, until the
queue was empty. It would have been much better if a CompactionRequest
decided what files to compact when it got to the head of the queue. Then
it could see that there are now 50 files needing compacting and to possibly
compact the 30 smallest ones, not just 5. When the insertions were done
after many hours, I would have preferred it to do one giant major
compaction, but it sat there and worked through it's compaction queue
compacting all sorts of different combinations of files.
Said differently, it looks like .92 picks the files to compact at
compaction request time rather than compaction execution time which is
problematic when these times grow far apart. Is that the case? Maybe
there are some other effects that are mitigating it...
On Sat, Feb 25, 2012 at 10:05 AM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:
> Hey guys,
> So in HBASE-4365 I ran multiple uploads and the latest one I reported
> was a 5TB import on 14 RS and it took 18h with Stack's patch. Now one
> thing we can see is that apart from some splitting, there's a lot of
> compacting going on. Stack was wondering exactly how much that IO
> costs us, so we devised a test where we could upload 5TB with 0
> compactions. Here are the results:
> The table was pre-split with 14 regions, 1 per region server.
> hbase.regionserver.maxlogs=64 (the block size is 128MB)
> export HBASE_REGIONSERVER_OPTS="$HBASE_JMX_BASE -Xmx14G
> -XX:CMSInitiatingOccupancyFraction=75 -XX:NewSize=256m
> The table had:
> MAX_FILESIZE => '549755813888', MEMSTORE_FLUSHSIZE => '549755813888'
> Basically what I'm trying to do is to never block and almost always be
> flushing. You'll probably notice the big difference between the lower
> and upper barriers and think "le hell?", it's because it takes so long
> to flush that you have to have enough room to take on more data while
> this is happening (and we are able to flush faster than we take on
> The test reports the following:
> Wall time: 34984.083 s
> Aggregate Throughput: 156893.07 queries/s
> Aggregate Throughput: 160030935.29 bytes/s
> That's 2x faster than when we wait for compactions and splits, not too
> bad but I'm pretty sure we can do better:
> - The QPS was very uneven, it seems that when it's flushing it takes
> a big toll and queries drop to ~100k/s while the rest of the time it's
> more like 200k/s. Need to figure out what's going there and if it's
> really just caused by flush-related IO.
> - The logs were rolling every 6 seconds and since this takes a global
> write lock, I can see how we could be slowing down a lot across 14
> - The load was a bit uneven, I miscalculated my split points and the
> last region always had 2-3k more queries per second.
> Stay tuned for more.