Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Re: How to prevent major compaction when doing bulk load provisioning?


Copy link to this message
-
Re: How to prevent major compaction when doing bulk load provisioning?
Jean-Daniel Cryans 2013-03-22, 16:32
On Fri, Mar 22, 2013 at 12:12 AM, Nicolas Seyvet
<[EMAIL PROTECTED]> wrote:
> @J-D: Thanks, this sounds very likely.
>
> One more thing, from the logs of one slave, I can see the following:
> 2013-03-21 22:27:15,041 INFO org.apache.hadoop.hbase.regionserver.Store:
> Completed major compaction of 9 file(s) in f of
> rc_nise,$,1363860406830.5689430f7a27cc511f99dcb62001edc6. into
> 5418126f3d154ef3aca8027e04512279, size=8.3g; total size for store is 8.3g
> [...]
> 2013-03-21 23:34:31,836 INFO org.apache.hadoop.hbase.regionserver.Store:
> Completed major compaction of 5 file(s) in f of
> rc_nise,$,1363860406830.5689430f7a27cc511f99dcb62001edc6. into
> 3bdeb58c57af4ee1a92d22865e707416, size=8.3g; total size for store is 8.3g
>
> Are not those the sign that a major compaction also occurred?
> And if so, what could have triggered it?

If the compaction algo selects all the files for compaction, it gets
upgraded into a major compaction because it's essentially the same
thing.

>
>
>
>
>
> On Thu, Mar 21, 2013 at 8:06 PM, Nicolas Seyvet <[EMAIL PROTECTED]>wrote:
>
>> @Ram: You are entirely correct, I made the exact same mistakes of mixing
>> up Large and minor compaction.  By looking closely, what I see is that at
>> around 200 HFiles per region it starts minor compacting files per group of
>> 10 HFiles.  The "problem" seems that this minor compacting never stops even
>> when there are about 20 HFiles left.  It just keep on going and on taking
>> more and more time (I guess because the files to compact are getting
>> bigger).
>>
>> Of course in parallel we keep on adding more and more data.
>>
>> @J-D: "It seems to me that it would be better if you were able to do a
>> single load for all your files." Yes, I agree.. but that is not what we
>> are testing, our use case is to use 1min batch files.
>>
>>
>>
>>
>>
>>