Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> How HBase perform per-column scan?


+
yun peng 2013-03-09, 19:11
+
Ted Yu 2013-03-10, 14:01
+
PG 2013-03-10, 14:14
+
Ted Yu 2013-03-10, 14:40
+
Anoop John 2013-03-10, 15:53
+
Liu, Raymond 2013-03-11, 02:13
+
Anoop Sam John 2013-03-11, 04:49
Copy link to this message
-
RE: How HBase perform per-column scan?
Hmm, I don't mean query bloom filter directly. I mean the storefilescanner will query rowcol bloomfilter to see is it need a seek or not. And I guess this will be performed on every row without need to specific a row keys?
> ROWCOL bloom says whether for a given row (rowkey) a given column (qualifier)
> is present in an HFile or not.  But for the user he dont know the rowkeys. He
> wants all the rows with column 'x'
>
> -Anoop-
>
> ________________________________________
> From: Liu, Raymond [[EMAIL PROTECTED]]
> Sent: Monday, March 11, 2013 7:43 AM
> To: [EMAIL PROTECTED]
> Subject: RE: How HBase perform per-column scan?
>
> Just curious, won't ROWCOL bloom filter works for this case?
>
> Best Regards,
> Raymond Liu
>
> >
> > As per the above said, you will need a full table scan on that CF.
> > As Ted said, consider having a look at your schema design.
> >
> > -Anoop-
> >
> >
> > On Sun, Mar 10, 2013 at 8:10 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > bq. physically column family should be able to perform efficiently
> > > (storage layer
> > >
> > > When you scan a row, data for different column families would be
> > > brought into memory (if you don't utilize HBASE-5416) Take a look at:
> > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=1354
> > > 1258&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-ta
> > > bp
> > > anel#comment-13541258
> > >
> > > which was based on the settings described in:
> > >
> > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-5416?focusedCommentId=1354
> > > 1191&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-ta
> > > bp
> > > anel#comment-13541191
> > >
> > > This boils down to your schema design. If possible, consider
> > > extracting column C into its own column family.
> > >
> > > Cheers
> > >
> > > On Sun, Mar 10, 2013 at 7:14 AM, PG <[EMAIL PROTECTED]> wrote:
> > >
> > > > Hi, Ted and Anoop, thanks for your notes.
> > > > I am talking about column rather than column family, since
> > > > physically column family should be able to perform efficiently
> > > > (storage layer, CF's are stored separately). But columns of the
> > > > same column family may be
> > > mixed
> > > > physically, and that makes filters column value hard... So I want
> > > > to know if there are any mechanism in HBase worked on this...
> > > > Regards,
> > > > Yun
> > > >
> > > > On Mar 10, 2013, at 10:01 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > Hi, Yun:
> > > > > Take a look at HBASE-5416 (Improve performance of scans with
> > > > > some kind
> > > of
> > > > > filters) which is in 0.94.5 release.
> > > > >
> > > > > In your case, you can use a filter which specifies column C as
> > > > > the essential family.
> > > > > Here I interpret column C as column family.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Sat, Mar 9, 2013 at 11:11 AM, yun peng
> > > > > <[EMAIL PROTECTED]>
> > > wrote:
> > > > >
> > > > >> Hi, All,
> > > > >> I want to find all existing values for a given column in a
> > > > >> HBase, and
> > > > would
> > > > >> that result in a full-table scan in HBase? For example, given a
> > > > >> column
> > > > C,
> > > > >> the table is of very large number of rows, from which few rows
> > > > >> (say
> > > > only 1
> > > > >> row) have non-empty values for column C. Would HBase still ues
> > > > >> a full
> > > > table
> > > > >> scan to find this row? Or HBase has any optimization work for
> > > > >> this
> > > kind
> > > > of
> > > > >> query?
> > > > >> Thanks...
> > > > >> Regards
> > > > >> Yun
> > > > >>
> > > >
> > >
+
Anoop John 2013-03-10, 07:11
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB