Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Overhead of Bloomfilters


Copy link to this message
-
Re: Overhead of Bloomfilters
Nicolas Spiegelberg 2011-01-25, 16:11
A great article for Bloom Filter rules of thumb:

http://corte.si/posts/code/bloom-filter-rules-of-thumb/

Note that only rules #1 & #2 apply for our use case. Rule #3, while true, isn't as big a worry because we use combinatorial generation for hashes, so the number of 'expensive' hash calculations is 2, no matter how many hash functions need to be generated.   Note that this drastically (400%+) sped up our BloomFilter.add() speed.

Sent from my iPhone

On Jan 25, 2011, at 6:22 AM, "Lars George" <[EMAIL PROTECTED]> wrote:

> Hi,
>
> (Probably aimed at Nicolas)
>
> Do we have a (rough) formula of overhead, i.e. the size of the
> bloomfilters for row and col granularity as for example depending on
> the KV count and average sizes (as reported by the HFile main()
> helper)?
>
> Thanks,
> Lars