Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - Efficient Tablet Merging [SEC=UNOFFICIAL]


Copy link to this message
-
Re: Efficient Tablet Merging [SEC=UNOFFICIAL]
Eric Newton 2013-10-03, 13:51
You should have a major compaction running if your tablet has too many
files.  If you don't, something is wrong. It does take some time to
re-write 10G of data.

If many merges occurred on a single tablet server, you may have these
many-file tablets on the same server, and there are not enough major
compaction threads to re-write those files right away.  If that's true, you
may wish to restart the tablet server in order to get the tablets pushed to
other idle servers.

Again, if you don't have major compactions running, you will want to start
looking for other problems.

-Eric

On Thu, Oct 3, 2013 at 2:29 AM, Dickson, Matt MR <
[EMAIL PROTECTED]> wrote:

> **
>
> *UNOFFICIAL*
> Hi Eric,
>
> We have gone with the second more conservative option. We changed our
> split threshold to 10GB and then we ran a merge over a week worth of
> tablets which has resulted in one tablet with a massive number of files. We
> then ran a query over that range and it is returning an message saying:
>
> Tablet has too many files (3n;20130914;20130907...) retrying...
>
> We assumed that when the merge was done that a major compaction would be
> started, which would notice that the tablet is too large, split it into
> 10GB tablets. We assumed that we would not have to manually start any
> compaction but instead it would be scheduled at some point after the merge
> finished.
>
> We have completed three separate merges of week long ranges and now have
> identified 3 tablet extents with too many files.
>
> Can you please explain what is supposed to happen? And whether after the
> merge, compact command for those ranges needs to be run (or will it do it
> automatically, as we have not seen any started)?
>
> Cheers
> Matt
>
>  ------------------------------
> *From:* Eric Newton [mailto:[EMAIL PROTECTED]]
> *Sent:* Thursday, 3 October 2013 13:28
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Efficient Tablet Merging [SEC=UNOFFICIAL]
>
>  I'll use ASCII graphics to demonstrate the size of a tablet.
>
> Small: []
> Medium: [ ]
> Large: [  ]
>
> Think of it like this... if you are running age-off... you probably have
> lots of little buckets of rows at the beginning and larger buckets at the
> end:
>
> [][][][][][][][][]...[ ][ ][ ][ ][ ][  ][  ][    ][    ][    ][    ][
>  ][    ]
>
> What you probably want is something like this:
>
> [               ][       ][       ][       ][       ][       ][       ][
>     ]
>
> Some big bucket at the start, with old data, and some larger buckets for
> everything afterwards.  But... this would probably work:
>
> [       ][       ][       ][       ][       ][       ][       ][       ][
>       ]
>
> Just a bunch of larger tablets throughout.
>
> So you need to set your merge size to "[      ]" (4G), and you can always
> keep creating smaller tablets for future rows with manual splits:
>
> [       ][       ][       ][       ][       ][       ][       ][       ][
>       ][  ][  ][  ][  ][  ]
>
>
> So increase the split threshold to 4G, and merge on 4G, but continue to
> make manual splits for your current days, as necessary.  Merge them away
> later.
>
>
> -Eric
>
>
>
>
> On Wed, Oct 2, 2013 at 6:35 PM, Dickson, Matt MR <
> [EMAIL PROTECTED]> wrote:
>
>> **
>>
>> *UNOFFICIAL*
>> Thanks Eric,
>>
>> If I do the merge with size of 4G does the split threshold need to be
>> increased to the 4G also?
>>
>>  ------------------------------
>> *From:* Eric Newton [mailto:[EMAIL PROTECTED]]
>> *Sent:* Wednesday, 2 October 2013 23:05
>> *To:* [EMAIL PROTECTED]
>> *Subject:* Re: Efficient Tablet Merging [SEC=UNOFFICIAL]
>>
>>   The most efficient way is kind of scary.  If this is a production
>> system, I would not recommend it.
>>
>> First, find out the size of your 10x tablets.  Let's say it's 10G.  Set
>> your split threshold to 10G.  Then merge all old tablets.... all of them
>> into one tablet.  This will dump thousands of files into a single tablet,
>> but it will soon split out again into the nice 10G tablets you are looking