Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - Why Would Accumulo v1.4.1 Run Major Compaction on a One Tablet Table?


Copy link to this message
-
Re: Why Would Accumulo v1.4.1 Run Major Compaction on a One Tablet Table?
Christopher 2013-02-02, 17:26
David-

Regarding "What is the relationship between rfiles and splits":

Tablets contain multiple files, created from minor compactions. When a
tablet splits, references in the !METADATA table to RFiles in the old
tablet (pre-split) get copied to the sections corresponding to the new
tablets (post-split). So, for a time, both tablets may be sharing some of
the same files. When a major compaction occurs for a tablet, new files are
created from the old files (dropping any data in the file that may have
been there, but doesn't belong to this new tablet) and these new files are
referenced only by the new tablets (until it splits further, anyway).

Regarding the 12 rfiles you are seeing:

Files get created during minor compactions, and get combined during major
compactions. During compactions, the data is written out as temporary
RFiles get created, and then they get renamed once they are done. The
reference to the old files, prior to the major compaction, will be removed
from the tablet's metadata, and a delete marker will be added to the
!METADATA table, scheduling the old file for deletion. The garbage
collector will eventually delete the file, so long as no tablets are still
using it. You may have several minor compactions before a major compaction,
and a collection cycle may not have run since the last major compaction,
resulting in these extra files. You can force a full major compaction from
the shell with the command "compact". Once it's done, and after the next
collection cycle, you should only have one file remaining for the tablet.
--
Christopher L Tubbs II
http://gravatar.com/ctubbsii
On Fri, Feb 1, 2013 at 3:37 PM, David Medinets <[EMAIL PROTECTED]>wrote:

> What is the relationship between rfiles and splits? hadoop fs -l
> /accumulo/tables/or/default_tablet is showing 12 rfiles. I see some
> .tmp rfile as well. My table still has just one split though. Maybe I
> am mixing up the Accumulo representation of the data with Accumulo's
> representation?
>
> On Fri, Feb 1, 2013 at 3:25 PM, Keith Turner <[EMAIL PROTECTED]> wrote:
> > On Fri, Feb 1, 2013 at 3:10 PM, David Medinets <[EMAIL PROTECTED]>
> wrote:
> >> Why Would Accumulo v1.4.1 Run Major Compaction on a One Tablet Table?
> >> Brand new table. No splits No deletes. Just slamming inserts as fast
> >> as possible.
> >
> > As data is inserted new files are produced via minor compaction.
> > Eventually some of the files will be merged into one file via major
> > compaction.
>