Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: How to prevent major compaction when doing bulk load provisioning?


Copy link to this message
-
Re: How to prevent major compaction when doing bulk load provisioning?
On Fri, Mar 22, 2013 at 12:12 AM, Nicolas Seyvet
<[EMAIL PROTECTED]> wrote:
> @J-D: Thanks, this sounds very likely.
>
> One more thing, from the logs of one slave, I can see the following:
> 2013-03-21 22:27:15,041 INFO org.apache.hadoop.hbase.regionserver.Store:
> Completed major compaction of 9 file(s) in f of
> rc_nise,$,1363860406830.5689430f7a27cc511f99dcb62001edc6. into
> 5418126f3d154ef3aca8027e04512279, size=8.3g; total size for store is 8.3g
> [...]
> 2013-03-21 23:34:31,836 INFO org.apache.hadoop.hbase.regionserver.Store:
> Completed major compaction of 5 file(s) in f of
> rc_nise,$,1363860406830.5689430f7a27cc511f99dcb62001edc6. into
> 3bdeb58c57af4ee1a92d22865e707416, size=8.3g; total size for store is 8.3g
>
> Are not those the sign that a major compaction also occurred?
> And if so, what could have triggered it?

If the compaction algo selects all the files for compaction, it gets
upgraded into a major compaction because it's essentially the same
thing.

>
>
>
>
>
> On Thu, Mar 21, 2013 at 8:06 PM, Nicolas Seyvet <[EMAIL PROTECTED]>wrote:
>
>> @Ram: You are entirely correct, I made the exact same mistakes of mixing
>> up Large and minor compaction.  By looking closely, what I see is that at
>> around 200 HFiles per region it starts minor compacting files per group of
>> 10 HFiles.  The "problem" seems that this minor compacting never stops even
>> when there are about 20 HFiles left.  It just keep on going and on taking
>> more and more time (I guess because the files to compact are getting
>> bigger).
>>
>> Of course in parallel we keep on adding more and more data.
>>
>> @J-D: "It seems to me that it would be better if you were able to do a
>> single load for all your files." Yes, I agree.. but that is not what we
>> are testing, our use case is to use 1min batch files.
>>
>>
>>
>>
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB