Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Major Compaction Concerns


Copy link to this message
-
Re: Major Compaction Concerns
Significant compaction JIRAs:
 - HBASE-2462 : original formulation of current compaction algorithm
 - HBASE-3209 : implementation
 - HBASE-1476 : multithreaded compactions
 - HBASE-3797 : storefile-based compaction selection
On 1/9/12 11:37 AM, "Ted Yu" <[EMAIL PROTECTED]> wrote:

>Nicolas:
>Thanks for your insight.
>
>Can you point Mikael to a few of the JIRAs where algorithm mentioned in #1
>was implemented ?
>
>On Mon, Jan 9, 2012 at 10:55 AM, Nicolas Spiegelberg
><[EMAIL PROTECTED]>wrote:
>
>> Mikael,
>>
>> Hi, I wrote the current compaction algorithm, so I should be able to
>> answer most questions that you have about the feature.  It sounds like
>> you're creating quite a task list of work to do, but I don't understand
>> what your use case is so a lot of that work may be not be critical and
>>you
>> can leverage existing functionality.  A better description of your
>>system
>> requirements is a must to getting a good solution.
>>
>> 1. Major compactions are triggered by 3 methods: user issued, timed, and
>> size-based.  You are probably hitting size-based compactions where your
>> config is disabling time-based compactions.  Minor compactions are
>>issued
>> on a size-based threshold.  The algorithm sees if sum(file[0:i] *
>>ratio) >
>> file[i+1] and includes file[0:i+1] if so.  This is a reverse iteration,
>>so
>> the highest 'i' value is used.  If all files match, then you can remove
>> delete markers [which is the difference between a major and minor
>> compaction].  Major compactions aren't a bad or time-intensive thing,
>>it's
>> just delete marker removal.
>>
>> As a note, we use timed majors in an OLTP production environment.  They
>> are less useful if you're doing bulk imports or have an OLAP environment
>> where you're either running a read-intensive test or the cluster is
>>idle.
>> In that case, it's definitely best to disable compactions and run them
>> when you're not using the cluster very much.
>>
>> 2. See HBASE-4418 for showing all configuration options in the Web UI.
>> This is in 0.92 however.
>>
>> 4. The compaction queue shows compactions that are waiting to happen.
>>If
>> you invoke a compaction and the queue is empty, the thread will
>> immediately pick up your request and the queue will remain empty.
>>
>> 8. A patch for pluggable compactions had been thrown up in the past.  It
>> was not well-tested and the compaction algorithm was undergoing major
>> design changes at the time that clashed with the patch.  I think it's
>>been
>> a low priority because there are many other ways to get big performance
>> wins from HBase outside of pluggable compactions.  Most people don't
>> understand how to optimize the current algorithm, which is well-known
>> (very similar to BigTable's).  I think bigger wins can come from
>>correctly
>> laying out a good schema and understanding the config knobs currently at
>> our disposal.
>>
>>
>>
>> On 1/8/12 7:25 AM, "Mikael Sitruk" <[EMAIL PROTECTED]> wrote:
>>
>> >Hi
>> >
>> >
>> >
>> >I have some concern regarding major compactions below...
>> >
>> >
>> >   1. According to best practices from the mailing list and from the
>>book,
>> >   automatic major compaction should be disabled. This can be done by
>> >setting
>> >   the property Œhbase.hregion.majorcompaction¹ to Œ0¹. Neverhteless
>>even
>> >   after having doing this I STILL see ³major compaction² messages in
>> >logs.
>> >   therefore it is unclear how can I manage major compactions. (The
>> >system has
>> >   heavy insert - uniformly on the cluster, and major compaction affect
>> >the
>> >   performance of the system).
>> >   If I'm not wrong it seems from the code that: even if not requested
>>and
>> >   even if the indicator is set to '0' (no automatic major compaction),
>> >major
>> >   compaction can be triggered by the code in case all store files are
>> >   candidate for a compaction (from Store.compact(final boolean
>> >forceMajor)).
>> >   Shouldn't the code add a condition that automatic major compaction
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB