|
|
Bryan Beaudreault 2012-02-16, 02:18
Hello,
We are looking at Bloom Filters and wondering if they are helpful when doing a sequential read (multi-row scan) or only when doing a Get for a single row. It logically makes sense that it would only affect (or to greater affect) getting a single row since it is a way for determining if you have to read a whole store file when fetching a key. But, we are told that Scan and Get are essentially the same code on the backend, so I imagine both will check the Blooms if they exist.
Also, would a ROWCOL bloom be more effective if you are often doing multi-row scans but always with specifying only a subset of columns in those rows?
Thanks,
Bryan
-
Re: Scans and Bloom Filter
Nicolas Spiegelberg 2012-02-16, 20:52
Bryan,
Currently, ROW & ROWCOL Bloom Filters are only checked for explicit, single-row 'Get' scans. ROWCOL BFs are only checked when you're querying for explicit column qualifiers (vs getting the entire row). This is because multi-row scans & full-row scans are implicit queries. To clarify:
With a multirow scan, the next row after 0x0001 is NOT 0x0002. HBase only knows that the next row is > 0x0001. The next row could be 0x00010 or 0x0003. However, when you call Htable.get(row=0x0001), HBase knows that you explicitly want that row and don't want 0x00010.
Nicolas
On 2/15/12 9:18 PM, "Bryan Beaudreault" <[EMAIL PROTECTED]> wrote:
>Hello, > >We are looking at Bloom Filters and wondering if they are helpful when >doing a sequential read (multi-row scan) or only when doing a Get for a >single row. It logically makes sense that it would only affect (or to >greater affect) getting a single row since it is a way for determining if >you have to read a whole store file when fetching a key. But, we are told >that Scan and Get are essentially the same code on the backend, so I >imagine both will check the Blooms if they exist. > >Also, would a ROWCOL bloom be more effective if you are often doing >multi-row scans but always with specifying only a subset of columns in >those rows? > >Thanks, > >Bryan
-
Re: Scans and Bloom Filter
Doug Meil 2012-02-16, 21:39
Good stuff Nicholas, I'll add this to the book.
On 2/16/12 3:52 PM, "Nicolas Spiegelberg" <[EMAIL PROTECTED]> wrote:
>Bryan, > >Currently, ROW & ROWCOL Bloom Filters are only checked for explicit, >single-row 'Get' scans. ROWCOL BFs are only checked when you're querying >for explicit column qualifiers (vs getting the entire row). This is >because multi-row scans & full-row scans are implicit queries. To >clarify: > >With a multirow scan, the next row after 0x0001 is NOT 0x0002. HBase only >knows that the next row is > 0x0001. The next row could be 0x00010 or >0x0003. However, when you call Htable.get(row=0x0001), HBase knows that >you explicitly want that row and don't want 0x00010. > >Nicolas > >On 2/15/12 9:18 PM, "Bryan Beaudreault" <[EMAIL PROTECTED]> wrote: > >>Hello, >> >>We are looking at Bloom Filters and wondering if they are helpful when >>doing a sequential read (multi-row scan) or only when doing a Get for a >>single row. It logically makes sense that it would only affect (or to >>greater affect) getting a single row since it is a way for determining if >>you have to read a whole store file when fetching a key. But, we are >>told >>that Scan and Get are essentially the same code on the backend, so I >>imagine both will check the Blooms if they exist. >> >>Also, would a ROWCOL bloom be more effective if you are often doing >>multi-row scans but always with specifying only a subset of columns in >>those rows? >> >>Thanks, >> >>Bryan > >
|
|