Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - How HBase perform per-column scan?


Copy link to this message
-
RE: How HBase perform per-column scan?
Anoop Sam John 2013-03-11, 04:49
ROWCOL bloom says whether for a given row (rowkey) a given column (qualifier) is present in an HFile or not.  But for the user he dont know the rowkeys. He wants all the rows with column 'x'

-Anoop-

________________________________________
From: Liu, Raymond [[EMAIL PROTECTED]]
Sent: Monday, March 11, 2013 7:43 AM
To: [EMAIL PROTECTED]
Subject: RE: How HBase perform per-column scan?

Just curious, won't ROWCOL bloom filter works for this case?

Best Regards,
Raymond Liu

>
> As per the above said, you will need a full table scan on that CF.
> As Ted said, consider having a look at your schema design.
>
> -Anoop-
>
>
> On Sun, Mar 10, 2013 at 8:10 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > bq. physically column family should be able to perform efficiently
> > (storage layer
> >
> > When you scan a row, data for different column families would be
> > brought into memory (if you don't utilize HBASE-5416) Take a look at:
> >
> >
> https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=1354
> > 1258&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabp
> > anel#comment-13541258
> >
> > which was based on the settings described in:
> >
> >
> >
> https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=1354
> > 1191&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabp
> > anel#comment-13541191
> >
> > This boils down to your schema design. If possible, consider
> > extracting column C into its own column family.
> >
> > Cheers
> >
> > On Sun, Mar 10, 2013 at 7:14 AM, PG <[EMAIL PROTECTED]> wrote:
> >
> > > Hi, Ted and Anoop, thanks for your notes.
> > > I am talking about column rather than column family, since
> > > physically column family should be able to perform efficiently
> > > (storage layer, CF's are stored separately). But columns of the same
> > > column family may be
> > mixed
> > > physically, and that makes filters column value hard... So I want to
> > > know if there are any mechanism in HBase worked on this...
> > > Regards,
> > > Yun
> > >
> > > On Mar 10, 2013, at 10:01 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > >
> > > > Hi, Yun:
> > > > Take a look at HBASE-5416 (Improve performance of scans with some
> > > > kind
> > of
> > > > filters) which is in 0.94.5 release.
> > > >
> > > > In your case, you can use a filter which specifies column C as the
> > > > essential family.
> > > > Here I interpret column C as column family.
> > > >
> > > > Cheers
> > > >
> > > > On Sat, Mar 9, 2013 at 11:11 AM, yun peng <[EMAIL PROTECTED]>
> > wrote:
> > > >
> > > >> Hi, All,
> > > >> I want to find all existing values for a given column in a HBase,
> > > >> and
> > > would
> > > >> that result in a full-table scan in HBase? For example, given a
> > > >> column
> > > C,
> > > >> the table is of very large number of rows, from which few rows
> > > >> (say
> > > only 1
> > > >> row) have non-empty values for column C. Would HBase still ues a
> > > >> full
> > > table
> > > >> scan to find this row? Or HBase has any optimization work for
> > > >> this
> > kind
> > > of
> > > >> query?
> > > >> Thanks...
> > > >> Regards
> > > >> Yun
> > > >>
> > >
> >