Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Major Compaction Concerns

Copy link to this message
Re: Major Compaction Concerns
>1. my CF were already working with BF they used ROWCOL, (i didn't pay
>attention to that at the time i wrote my answers)
>2. I see form the logs that the BF is already 100% - is it bad? should I
>had more memory for BF?

Since Bloom Filters are a probabilistic optimization, it's kinda hard to
analyze your efficiency.  Mostly, we rely on theory and a little bit of
experimentation.  Basically, you want your key queries to have a high miss
rate on HFiles.  This doesn't mean that the key doesn't exist in the
Store.  It just means that you're not constantly writing to it, so it
doesn't exist in all N StoreFiles.  Optimally, you want 1 of the blooms to
hit (key exists in file) and N-1 to miss. Metrics that you can look at
(not sure about the versions of when these were introduced):

keymaybeinbloomcnt : number of bloom hits
keynotinbloomcnt : number of bloom misses.
staticbloomsizekb : size that bloom data takes up in memory (HFileV1)
Note that per-CF metrics are added in 0.94 so you can watch bloom
efficiency in finer granularity.

>3. HLog compression (HBASE-4608) is not scheduled yet, is it by intention?

There's limited bandwidth and this is an open source project, so... :)

>4. Compaction.ratio is only for 0.92.x releases, so i cannot use it yet.

"hbase.hstore.compaction.ratio" is in 0.90
>6. I have also noticed that in a workload of pure insert (no read, empty
>regions, new keys) the store files on the RS can reach more than 4500
>files, nevertheless with a update/read scenario the store files were not
>passing 1500 files per region (the throttling of the flush was active and
>not in insert) Is there an explanation for that?

That depends on the size of your major compacted data.  Updates will
dedupe and lower your compaction volume.