-Re: Understanding compacting memstore/HLog before flush
Igal Shilman 2012-05-02, 05:11
Have you seen: https://issues.apache.org/jira/browse/HBASE-4241 ?
On May 2, 2012 7:01 AM, "Alex Baranau" <[EMAIL PROTECTED]> wrote:
> Could you please tell me if I correctly understand this problem...
> Example behavior 1:
> * create table
> * do 10 operations: insert cell, override (given that versions # configured
> to 1) it, override, ... override.
> * after flushing memstore with these edits, all of them getting written to
> Ideally, in this situation one edit should be performed (resulting value of
> cell). I.e. only "current visible state" of memstore should be flushed as
> opposed to flushing all the edits from HLog. This will have a lot of
> benefits (e.g. reducing data amount to flush -> may be less frequent
> flushing needing -> less freq compactions, etc. operations), esp in
> particular use-cases (like using counters, or updating some "aggregated
> The problem, as I understand (correct me here, please if I'm wrong) is that
> it is not an easy thing to do, mainly because
> 1) additional resource management burden (flushing large memstore isn't
> 2) compaction may add a lot of unnecessary overhead (so that in some cases
> there will be no actual benefit from it), may make flushing much slower,
> which can bring a lot of issues
> 3) edits flushed from memstore and HLog edits should be kept in sync,
> because we want the flush process to be reliable. I.e. if it fails in the
> middle we should be able to restore the state from HLog. Keeping memstore
> and HLog in sync during compaction (and we would need partial compaction of
> some older data of the memstore) is difficult.
> 4) anything else?
> Esp. 3rd point - am I getting it right?
> Alex Baranau