Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Major Compaction Concerns


Copy link to this message
-
Re: Major Compaction Concerns
Nicolas Spiegelberg 2012-01-09, 19:42
Significant compaction JIRAs:
 - HBASE-2462 : original formulation of current compaction algorithm
 - HBASE-3209 : implementation
 - HBASE-1476 : multithreaded compactions
 - HBASE-3797 : storefile-based compaction selection
On 1/9/12 11:37 AM, "Ted Yu" <[EMAIL PROTECTED]> wrote:

>Nicolas:
>Thanks for your insight.
>
>Can you point Mikael to a few of the JIRAs where algorithm mentioned in #1
>was implemented ?
>
>On Mon, Jan 9, 2012 at 10:55 AM, Nicolas Spiegelberg
><[EMAIL PROTECTED]>wrote:
>
>> Mikael,
>>
>> Hi, I wrote the current compaction algorithm, so I should be able to
>> answer most questions that you have about the feature.  It sounds like
>> you're creating quite a task list of work to do, but I don't understand
>> what your use case is so a lot of that work may be not be critical and
>>you
>> can leverage existing functionality.  A better description of your
>>system
>> requirements is a must to getting a good solution.
>>
>> 1. Major compactions are triggered by 3 methods: user issued, timed, and
>> size-based.  You are probably hitting size-based compactions where your
>> config is disabling time-based compactions.  Minor compactions are
>>issued
>> on a size-based threshold.  The algorithm sees if sum(file[0:i] *
>>ratio) >
>> file[i+1] and includes file[0:i+1] if so.  This is a reverse iteration,
>>so
>> the highest 'i' value is used.  If all files match, then you can remove
>> delete markers [which is the difference between a major and minor
>> compaction].  Major compactions aren't a bad or time-intensive thing,
>>it's
>> just delete marker removal.
>>
>> As a note, we use timed majors in an OLTP production environment.  They
>> are less useful if you're doing bulk imports or have an OLAP environment
>> where you're either running a read-intensive test or the cluster is
>>idle.
>> In that case, it's definitely best to disable compactions and run them
>> when you're not using the cluster very much.
>>
>> 2. See HBASE-4418 for showing all configuration options in the Web UI.
>> This is in 0.92 however.
>>
>> 4. The compaction queue shows compactions that are waiting to happen.
>>If
>> you invoke a compaction and the queue is empty, the thread will
>> immediately pick up your request and the queue will remain empty.
>>
>> 8. A patch for pluggable compactions had been thrown up in the past.  It
>> was not well-tested and the compaction algorithm was undergoing major
>> design changes at the time that clashed with the patch.  I think it's
>>been
>> a low priority because there are many other ways to get big performance
>> wins from HBase outside of pluggable compactions.  Most people don't
>> understand how to optimize the current algorithm, which is well-known
>> (very similar to BigTable's).  I think bigger wins can come from
>>correctly
>> laying out a good schema and understanding the config knobs currently at
>> our disposal.
>>
>>
>>
>> On 1/8/12 7:25 AM, "Mikael Sitruk" <[EMAIL PROTECTED]> wrote:
>>
>> >Hi
>> >
>> >
>> >
>> >I have some concern regarding major compactions below...
>> >
>> >
>> >   1. According to best practices from the mailing list and from the
>>book,
>> >   automatic major compaction should be disabled. This can be done by
>> >setting
>> >   the property Œhbase.hregion.majorcompaction¹ to Œ0¹. Neverhteless
>>even
>> >   after having doing this I STILL see ³major compaction² messages in
>> >logs.
>> >   therefore it is unclear how can I manage major compactions. (The
>> >system has
>> >   heavy insert - uniformly on the cluster, and major compaction affect
>> >the
>> >   performance of the system).
>> >   If I'm not wrong it seems from the code that: even if not requested
>>and
>> >   even if the indicator is set to '0' (no automatic major compaction),
>> >major
>> >   compaction can be triggered by the code in case all store files are
>> >   candidate for a compaction (from Store.compact(final boolean
>> >forceMajor)).
>> >   Shouldn't the code add a condition that automatic major compaction