Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Tables gets Major Compacted even if they haven't changed


Copy link to this message
-
Re: Tables gets Major Compacted even if they haven't changed
Thanx for the discussion guys.

@Anil, we have turned off major compaction in the settings. This is a
script which is run manually to make sure all tables get major compacted
ever so often to increase data locality. In our case, there is
some collateral damage of compacting unchanged regions.

I was planning to rework the script to compact regions and not tables by
querying how many store files a region has, and compact if num_store_files
> 1. Is that a good solution in the interim?
On Tue, Sep 10, 2013 at 11:11 AM, Dave Latham <[EMAIL PROTECTED]> wrote:

> Major compactions can still be useful to improve locality - could we add a
> condition to check for that too?
>
>
> On Mon, Sep 9, 2013 at 10:41 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > Interesting. I guess we could add a check to avoid major compactions if
> > (1) no TTL is set or we can show that all data is newer and (2) there's
> > only one file (3) and there are no delete markers. All of these can be
> > cheaply checked with some HFile metadata (we might have all data needed
> > already).
> >
> >
> > That would take care of both of your scenarios.
> >
> > -- Lars
> > ________________________________
> > From: Premal Shah <[EMAIL PROTECTED]>
> > To: user <[EMAIL PROTECTED]>
> > Sent: Monday, September 9, 2013 9:02 PM
> > Subject: Tables gets Major Compacted even if they haven't changed
> >
> >
> > Hi,
> > We have a bunch on tables in our HBase cluster. We have a script which
> > makes sure all of them get Major Compacted once every 2 days. There are 2
> > things I'm observing
> >
> > 1) Table X has not updated in a month. We have not inserted, updated or
> > deleted data. However, it still major compacts every 2 days. All the
> > regions in this table have only 1 store file.
> >
> > 2) Table Y has a few regions where the rowkey is essentially a timestamp.
> > So, we only write to 1 region at a time. Over time, the region splits,
> and
> > then we write the one of the split regions. Now, whenever we major
> compact
> > the table, all regions get major compacted. Only 1 region has more than 1
> > store file, every other region has exactly once.
> >
> > Is there a way to avoid compaction of regions that have not changed?
> >
> > We are using HBase 0.94.11
> >
> > --
> > Regards,
> > Premal Shah.
> >
>

--
Regards,
Premal Shah.