|
yun peng
2013-03-09, 19:11
Anoop John
2013-03-10, 07:11
Ted Yu
2013-03-10, 14:01
PG
2013-03-10, 14:14
Ted Yu
2013-03-10, 14:40
Anoop John
2013-03-10, 15:53
Liu, Raymond
2013-03-11, 02:13
Anoop Sam John
2013-03-11, 04:49
Liu, Raymond
2013-03-11, 05:12
|
-
How HBase perform per-column scan?yun peng 2013-03-09, 19:11
Hi, All,
I want to find all existing values for a given column in a HBase, and would that result in a full-table scan in HBase? For example, given a column C, the table is of very large number of rows, from which few rows (say only 1 row) have non-empty values for column C. Would HBase still ues a full table scan to find this row? Or HBase has any optimization work for this kind of query? Thanks... Regards Yun
-
Re: How HBase perform per-column scan?Anoop John 2013-03-10, 07:11
When you say column, you mean one column family (CF) or column qualifier?
If this is one column qualifier and there are other qualifiers in the same CF? -Anoop- On Sun, Mar 10, 2013 at 12:41 AM, yun peng <[EMAIL PROTECTED]> wrote: > Hi, All, > I want to find all existing values for a given column in a HBase, and would > that result in a full-table scan in HBase? For example, given a column C, > the table is of very large number of rows, from which few rows (say only 1 > row) have non-empty values for column C. Would HBase still ues a full table > scan to find this row? Or HBase has any optimization work for this kind of > query? > Thanks... > Regards > Yun >
-
Re: How HBase perform per-column scan?Ted Yu 2013-03-10, 14:01
Hi, Yun:
Take a look at HBASE-5416 (Improve performance of scans with some kind of filters) which is in 0.94.5 release. In your case, you can use a filter which specifies column C as the essential family. Here I interpret column C as column family. Cheers On Sat, Mar 9, 2013 at 11:11 AM, yun peng <[EMAIL PROTECTED]> wrote: > Hi, All, > I want to find all existing values for a given column in a HBase, and would > that result in a full-table scan in HBase? For example, given a column C, > the table is of very large number of rows, from which few rows (say only 1 > row) have non-empty values for column C. Would HBase still ues a full table > scan to find this row? Or HBase has any optimization work for this kind of > query? > Thanks... > Regards > Yun >
-
Re: How HBase perform per-column scan?PG 2013-03-10, 14:14
Hi, Ted and Anoop, thanks for your notes.
I am talking about column rather than column family, since physically column family should be able to perform efficiently (storage layer, CF's are stored separately). But columns of the same column family may be mixed physically, and that makes filters column value hard... So I want to know if there are any mechanism in HBase worked on this... Regards, Yun On Mar 10, 2013, at 10:01 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > Hi, Yun: > Take a look at HBASE-5416 (Improve performance of scans with some kind of > filters) which is in 0.94.5 release. > > In your case, you can use a filter which specifies column C as the > essential family. > Here I interpret column C as column family. > > Cheers > > On Sat, Mar 9, 2013 at 11:11 AM, yun peng <[EMAIL PROTECTED]> wrote: > >> Hi, All, >> I want to find all existing values for a given column in a HBase, and would >> that result in a full-table scan in HBase? For example, given a column C, >> the table is of very large number of rows, from which few rows (say only 1 >> row) have non-empty values for column C. Would HBase still ues a full table >> scan to find this row? Or HBase has any optimization work for this kind of >> query? >> Thanks... >> Regards >> Yun >>
-
Re: How HBase perform per-column scan?Ted Yu 2013-03-10, 14:40
bq. physically column family should be able to perform efficiently (storage
layer When you scan a row, data for different column families would be brought into memory (if you don't utilize HBASE-5416) Take a look at: https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=13541258&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13541258 which was based on the settings described in: https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=13541191&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13541191 This boils down to your schema design. If possible, consider extracting column C into its own column family. Cheers On Sun, Mar 10, 2013 at 7:14 AM, PG <[EMAIL PROTECTED]> wrote: > Hi, Ted and Anoop, thanks for your notes. > I am talking about column rather than column family, since physically > column family should be able to perform efficiently (storage layer, CF's > are stored separately). But columns of the same column family may be mixed > physically, and that makes filters column value hard... So I want to know > if there are any mechanism in HBase worked on this... > Regards, > Yun > > On Mar 10, 2013, at 10:01 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > Hi, Yun: > > Take a look at HBASE-5416 (Improve performance of scans with some kind of > > filters) which is in 0.94.5 release. > > > > In your case, you can use a filter which specifies column C as the > > essential family. > > Here I interpret column C as column family. > > > > Cheers > > > > On Sat, Mar 9, 2013 at 11:11 AM, yun peng <[EMAIL PROTECTED]> wrote: > > > >> Hi, All, > >> I want to find all existing values for a given column in a HBase, and > would > >> that result in a full-table scan in HBase? For example, given a column > C, > >> the table is of very large number of rows, from which few rows (say > only 1 > >> row) have non-empty values for column C. Would HBase still ues a full > table > >> scan to find this row? Or HBase has any optimization work for this kind > of > >> query? > >> Thanks... > >> Regards > >> Yun > >> >
-
Re: How HBase perform per-column scan?Anoop John 2013-03-10, 15:53
As per the above said, you will need a full table scan on that CF.
As Ted said, consider having a look at your schema design. -Anoop- On Sun, Mar 10, 2013 at 8:10 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > bq. physically column family should be able to perform efficiently (storage > layer > > When you scan a row, data for different column families would be brought > into memory (if you don't utilize HBASE-5416) > Take a look at: > > https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=13541258&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13541258 > > which was based on the settings described in: > > > https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=13541191&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13541191 > > This boils down to your schema design. If possible, consider extracting > column C into its own column family. > > Cheers > > On Sun, Mar 10, 2013 at 7:14 AM, PG <[EMAIL PROTECTED]> wrote: > > > Hi, Ted and Anoop, thanks for your notes. > > I am talking about column rather than column family, since physically > > column family should be able to perform efficiently (storage layer, CF's > > are stored separately). But columns of the same column family may be > mixed > > physically, and that makes filters column value hard... So I want to know > > if there are any mechanism in HBase worked on this... > > Regards, > > Yun > > > > On Mar 10, 2013, at 10:01 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > Hi, Yun: > > > Take a look at HBASE-5416 (Improve performance of scans with some kind > of > > > filters) which is in 0.94.5 release. > > > > > > In your case, you can use a filter which specifies column C as the > > > essential family. > > > Here I interpret column C as column family. > > > > > > Cheers > > > > > > On Sat, Mar 9, 2013 at 11:11 AM, yun peng <[EMAIL PROTECTED]> > wrote: > > > > > >> Hi, All, > > >> I want to find all existing values for a given column in a HBase, and > > would > > >> that result in a full-table scan in HBase? For example, given a column > > C, > > >> the table is of very large number of rows, from which few rows (say > > only 1 > > >> row) have non-empty values for column C. Would HBase still ues a full > > table > > >> scan to find this row? Or HBase has any optimization work for this > kind > > of > > >> query? > > >> Thanks... > > >> Regards > > >> Yun > > >> > > >
-
RE: How HBase perform per-column scan?Liu, Raymond 2013-03-11, 02:13
Just curious, won't ROWCOL bloom filter works for this case?
Best Regards, Raymond Liu > > As per the above said, you will need a full table scan on that CF. > As Ted said, consider having a look at your schema design. > > -Anoop- > > > On Sun, Mar 10, 2013 at 8:10 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > bq. physically column family should be able to perform efficiently > > (storage layer > > > > When you scan a row, data for different column families would be > > brought into memory (if you don't utilize HBASE-5416) Take a look at: > > > > > https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=1354 > > 1258&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabp > > anel#comment-13541258 > > > > which was based on the settings described in: > > > > > > > https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=1354 > > 1191&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabp > > anel#comment-13541191 > > > > This boils down to your schema design. If possible, consider > > extracting column C into its own column family. > > > > Cheers > > > > On Sun, Mar 10, 2013 at 7:14 AM, PG <[EMAIL PROTECTED]> wrote: > > > > > Hi, Ted and Anoop, thanks for your notes. > > > I am talking about column rather than column family, since > > > physically column family should be able to perform efficiently > > > (storage layer, CF's are stored separately). But columns of the same > > > column family may be > > mixed > > > physically, and that makes filters column value hard... So I want to > > > know if there are any mechanism in HBase worked on this... > > > Regards, > > > Yun > > > > > > On Mar 10, 2013, at 10:01 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > Hi, Yun: > > > > Take a look at HBASE-5416 (Improve performance of scans with some > > > > kind > > of > > > > filters) which is in 0.94.5 release. > > > > > > > > In your case, you can use a filter which specifies column C as the > > > > essential family. > > > > Here I interpret column C as column family. > > > > > > > > Cheers > > > > > > > > On Sat, Mar 9, 2013 at 11:11 AM, yun peng <[EMAIL PROTECTED]> > > wrote: > > > > > > > >> Hi, All, > > > >> I want to find all existing values for a given column in a HBase, > > > >> and > > > would > > > >> that result in a full-table scan in HBase? For example, given a > > > >> column > > > C, > > > >> the table is of very large number of rows, from which few rows > > > >> (say > > > only 1 > > > >> row) have non-empty values for column C. Would HBase still ues a > > > >> full > > > table > > > >> scan to find this row? Or HBase has any optimization work for > > > >> this > > kind > > > of > > > >> query? > > > >> Thanks... > > > >> Regards > > > >> Yun > > > >> > > > > >
-
RE: How HBase perform per-column scan?Anoop Sam John 2013-03-11, 04:49
ROWCOL bloom says whether for a given row (rowkey) a given column (qualifier) is present in an HFile or not. But for the user he dont know the rowkeys. He wants all the rows with column 'x'
-Anoop- ________________________________________ From: Liu, Raymond [[EMAIL PROTECTED]] Sent: Monday, March 11, 2013 7:43 AM To: [EMAIL PROTECTED] Subject: RE: How HBase perform per-column scan? Just curious, won't ROWCOL bloom filter works for this case? Best Regards, Raymond Liu > > As per the above said, you will need a full table scan on that CF. > As Ted said, consider having a look at your schema design. > > -Anoop- > > > On Sun, Mar 10, 2013 at 8:10 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > bq. physically column family should be able to perform efficiently > > (storage layer > > > > When you scan a row, data for different column families would be > > brought into memory (if you don't utilize HBASE-5416) Take a look at: > > > > > https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=1354 > > 1258&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabp > > anel#comment-13541258 > > > > which was based on the settings described in: > > > > > > > https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=1354 > > 1191&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabp > > anel#comment-13541191 > > > > This boils down to your schema design. If possible, consider > > extracting column C into its own column family. > > > > Cheers > > > > On Sun, Mar 10, 2013 at 7:14 AM, PG <[EMAIL PROTECTED]> wrote: > > > > > Hi, Ted and Anoop, thanks for your notes. > > > I am talking about column rather than column family, since > > > physically column family should be able to perform efficiently > > > (storage layer, CF's are stored separately). But columns of the same > > > column family may be > > mixed > > > physically, and that makes filters column value hard... So I want to > > > know if there are any mechanism in HBase worked on this... > > > Regards, > > > Yun > > > > > > On Mar 10, 2013, at 10:01 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > Hi, Yun: > > > > Take a look at HBASE-5416 (Improve performance of scans with some > > > > kind > > of > > > > filters) which is in 0.94.5 release. > > > > > > > > In your case, you can use a filter which specifies column C as the > > > > essential family. > > > > Here I interpret column C as column family. > > > > > > > > Cheers > > > > > > > > On Sat, Mar 9, 2013 at 11:11 AM, yun peng <[EMAIL PROTECTED]> > > wrote: > > > > > > > >> Hi, All, > > > >> I want to find all existing values for a given column in a HBase, > > > >> and > > > would > > > >> that result in a full-table scan in HBase? For example, given a > > > >> column > > > C, > > > >> the table is of very large number of rows, from which few rows > > > >> (say > > > only 1 > > > >> row) have non-empty values for column C. Would HBase still ues a > > > >> full > > > table > > > >> scan to find this row? Or HBase has any optimization work for > > > >> this > > kind > > > of > > > >> query? > > > >> Thanks... > > > >> Regards > > > >> Yun > > > >> > > > > >
-
RE: How HBase perform per-column scan?Liu, Raymond 2013-03-11, 05:12
Hmm, I don't mean query bloom filter directly. I mean the storefilescanner will query rowcol bloomfilter to see is it need a seek or not. And I guess this will be performed on every row without need to specific a row keys?
> ROWCOL bloom says whether for a given row (rowkey) a given column (qualifier) > is present in an HFile or not. But for the user he dont know the rowkeys. He > wants all the rows with column 'x' > > -Anoop- > > ________________________________________ > From: Liu, Raymond [[EMAIL PROTECTED]] > Sent: Monday, March 11, 2013 7:43 AM > To: [EMAIL PROTECTED] > Subject: RE: How HBase perform per-column scan? > > Just curious, won't ROWCOL bloom filter works for this case? > > Best Regards, > Raymond Liu > > > > > As per the above said, you will need a full table scan on that CF. > > As Ted said, consider having a look at your schema design. > > > > -Anoop- > > > > > > On Sun, Mar 10, 2013 at 8:10 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > bq. physically column family should be able to perform efficiently > > > (storage layer > > > > > > When you scan a row, data for different column families would be > > > brought into memory (if you don't utilize HBASE-5416) Take a look at: > > > > > > > > > https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=1354 > > > 1258&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-ta > > > bp > > > anel#comment-13541258 > > > > > > which was based on the settings described in: > > > > > > > > > > > > https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=1354 > > > 1191&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-ta > > > bp > > > anel#comment-13541191 > > > > > > This boils down to your schema design. If possible, consider > > > extracting column C into its own column family. > > > > > > Cheers > > > > > > On Sun, Mar 10, 2013 at 7:14 AM, PG <[EMAIL PROTECTED]> wrote: > > > > > > > Hi, Ted and Anoop, thanks for your notes. > > > > I am talking about column rather than column family, since > > > > physically column family should be able to perform efficiently > > > > (storage layer, CF's are stored separately). But columns of the > > > > same column family may be > > > mixed > > > > physically, and that makes filters column value hard... So I want > > > > to know if there are any mechanism in HBase worked on this... > > > > Regards, > > > > Yun > > > > > > > > On Mar 10, 2013, at 10:01 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > > > > > Hi, Yun: > > > > > Take a look at HBASE-5416 (Improve performance of scans with > > > > > some kind > > > of > > > > > filters) which is in 0.94.5 release. > > > > > > > > > > In your case, you can use a filter which specifies column C as > > > > > the essential family. > > > > > Here I interpret column C as column family. > > > > > > > > > > Cheers > > > > > > > > > > On Sat, Mar 9, 2013 at 11:11 AM, yun peng > > > > > <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > >> Hi, All, > > > > >> I want to find all existing values for a given column in a > > > > >> HBase, and > > > > would > > > > >> that result in a full-table scan in HBase? For example, given a > > > > >> column > > > > C, > > > > >> the table is of very large number of rows, from which few rows > > > > >> (say > > > > only 1 > > > > >> row) have non-empty values for column C. Would HBase still ues > > > > >> a full > > > > table > > > > >> scan to find this row? Or HBase has any optimization work for > > > > >> this > > > kind > > > > of > > > > >> query? > > > > >> Thanks... > > > > >> Regards > > > > >> Yun > > > > >> > > > > > > > |