Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Scans and Bloom Filter


Copy link to this message
-
Re: Scans and Bloom Filter
Bryan,

Currently, ROW & ROWCOL Bloom Filters are only checked for explicit,
single-row 'Get' scans.  ROWCOL BFs are only checked when you're querying
for explicit column qualifiers (vs getting the entire row).  This is
because multi-row scans & full-row scans are implicit queries.  To
clarify:

With a multirow scan, the next row after 0x0001 is NOT 0x0002.  HBase only
knows that the next row is > 0x0001.  The next row could be 0x00010 or
0x0003.  However, when you call Htable.get(row=0x0001), HBase knows that
you explicitly want that row and don't want 0x00010.

Nicolas

On 2/15/12 9:18 PM, "Bryan Beaudreault" <[EMAIL PROTECTED]> wrote:

>Hello,
>
>We are looking at Bloom Filters and wondering if they are helpful when
>doing a sequential read (multi-row scan) or only when doing a Get for a
>single row.  It logically makes sense that it would only affect (or to
>greater affect) getting a single row since it is a way for determining if
>you have to read a whole store file when fetching a key.  But, we are told
>that Scan and Get are essentially the same code on the backend, so I
>imagine both will check the Blooms if they exist.
>
>Also, would a ROWCOL bloom be more effective if you are often doing
>multi-row scans but always with specifying only a subset of columns in
>those rows?
>
>Thanks,
>
>Bryan
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB