|
|
-
Re: Why Would Accumulo v1.4.1 Run Major Compaction on a One Tablet Table?Christopher 2013-02-02, 17:26
David-
Regarding "What is the relationship between rfiles and splits": Tablets contain multiple files, created from minor compactions. When a tablet splits, references in the !METADATA table to RFiles in the old tablet (pre-split) get copied to the sections corresponding to the new tablets (post-split). So, for a time, both tablets may be sharing some of the same files. When a major compaction occurs for a tablet, new files are created from the old files (dropping any data in the file that may have been there, but doesn't belong to this new tablet) and these new files are referenced only by the new tablets (until it splits further, anyway). Regarding the 12 rfiles you are seeing: Files get created during minor compactions, and get combined during major compactions. During compactions, the data is written out as temporary RFiles get created, and then they get renamed once they are done. The reference to the old files, prior to the major compaction, will be removed from the tablet's metadata, and a delete marker will be added to the !METADATA table, scheduling the old file for deletion. The garbage collector will eventually delete the file, so long as no tablets are still using it. You may have several minor compactions before a major compaction, and a collection cycle may not have run since the last major compaction, resulting in these extra files. You can force a full major compaction from the shell with the command "compact". Once it's done, and after the next collection cycle, you should only have one file remaining for the tablet. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Fri, Feb 1, 2013 at 3:37 PM, David Medinets <[EMAIL PROTECTED]>wrote: > What is the relationship between rfiles and splits? hadoop fs -l > /accumulo/tables/or/default_tablet is showing 12 rfiles. I see some > .tmp rfile as well. My table still has just one split though. Maybe I > am mixing up the Accumulo representation of the data with Accumulo's > representation? > > On Fri, Feb 1, 2013 at 3:25 PM, Keith Turner <[EMAIL PROTECTED]> wrote: > > On Fri, Feb 1, 2013 at 3:10 PM, David Medinets <[EMAIL PROTECTED]> > wrote: > >> Why Would Accumulo v1.4.1 Run Major Compaction on a One Tablet Table? > >> Brand new table. No splits No deletes. Just slamming inserts as fast > >> as possible. > > > > As data is inserted new files are produced via minor compaction. > > Eventually some of the files will be merged into one file via major > > compaction. > |