If you value write performance over read performance you can increase the number of blocking storefiles (more storefiles mean that the merging at read-time is more expensive).
Even in that case you are only increasing the buffer, though, if you maintain the high write load you will run into the same problem.
What is the IO you see on the machines on your cluster versus what the theoretical maximum should be?
----- Original Message -----
From: yun peng <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
Sent: Sunday, June 9, 2013 7:17 PM
Subject: Re: Hbase write stream blocking and any solutions?
thanks lars for the insights. I guess current hbase may have to block write
stream even when data write rate does not reach the limit of IO subsystems.
Blocking happen because of the compaction which is so consuming and has to
be invoked synchronously (say to keep #hfile < K), then the invocation of
compaction could block write stream..? (correct me if I am wrong).
On Sun, Jun 9, 2013 at 7:33 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> One thing to keep in mind is that this typically happen when you write
> faster than your IO subsystems can support.
> For a while HBase will absorb this by buffering in the memstore, but if
> you sustain the write load something will have to slow down the writers.
> Granted, this could be done a bit more graceful.
> -- Lars
> From: yun peng <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Sunday, June 9, 2013 6:28 AM
> Subject: Hbase write stream blocking and any solutions?
> Hi, All
> HBase could block the online write operations when there are too many data
> in memstore (to be more efficient for the potential compaction incurred by
> this flush when there're many files on disk). This blocking effect is also
> observed by others (e.g.,
> The solution come up with on the above web blog is to increase the Memstore
> size with fewer # of flushes, and to tolerate bigger # of files on disk (by
> increasing blockingStoreFiles). This is a kind of HBase tuning towards
> write intensive workload.
> My targeted application has dynamical workload which may changes from
> write-intensive to read-intensive. Also there are peak hours (when blocking
> is user perceivable and should not be invoked) and offpeak hours (when
> blocking is tolerable). I am wondering if there is any more intelligent
> solution (say a clever scheduling policy that blocks only at offpeak hours)
> exist in the latest HBase version that could minimizes the effect of write
> stream block?