Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Scan addFamily vs FamilyFilter(EQUAL, ...)


Copy link to this message
-
RE: Scan addFamily vs FamilyFilter(EQUAL, ...)
Just to add on.
The java doc clearly says in FamilyFilter that

* If an already known column family is looked for, use {@link
org.apache.hadoop.hbase.client.Get#addFamily(byte[])}
* directly rather than a filter.

So addFamily should be better.

Regards
Ram

> -----Original Message-----
> From: Anoop Sam John [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, May 31, 2012 11:49 AM
> To: [EMAIL PROTECTED]
> Subject: RE: Scan addFamily vs FamilyFilter(EQUAL, ...)
>
> Hi,
>      As per my understanding of the Scan code in your scenario where
> you want to go with scanning of some CFs ( not all)  You go with
> Scan#addFamily.
> The FamilyFilter also doing the same thing. But there is a difference
> in the performance.
> When one specify the CFs in the scan,  the scanner will be created for
> only those many Stores. For the other CFs, there wont be any scanners
> and so those stores are not scanned. ( The HFile data is not fetched )
> Instead when one use the FamilyFilter and not specify any specific
> columns (using Scan#addFamily) all the stores will get scanned and data
> will get fetched from HFiles. Later these KVs corresponding to which
> you needed (as per your FamilyFilter)  only will get included in the
> Result and others just avoided.  So there will be performance
> difference I feel..   Correct me if I am wrong pls...
>
> @Stack
> >One thing I ran into when using the Scan.addFamily / Scan.addColumn is
> that those two methods overwrite each other.
> In the Scan#addColumn javadoc it is clearly telling about this
> overwrites...   So this seems intentionally done correct?
>
>
> -Anoop-
> ________________________________________
> From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] on behalf of Stack
> [[EMAIL PROTECTED]]
> Sent: Wednesday, May 30, 2012 11:13 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Scan addFamily vs FamilyFilter(EQUAL, ...)
>
> On Wed, May 30, 2012 at 9:59 AM, Kevin <[EMAIL PROTECTED]>
> wrote:
> > I am curious and trying to learn which method is best when wanting to
> limit
> > a scan to a particular column or column family. The Scan class
> carries a
> > Filter instance and a TreeMap of the family map and I am unsure how
> they
> > get carried through to the server-side functionality. In terms of
> > performance is there any difference between doing Scan.addFamily(x)
> and
> > Scan.setFilter(new FamilyFilter(CompareFilter.CompareOp.EQUAL, x)?
> >
>
> There is probably not noticeable difference in performance but
> Scan#addFamily is the more natural way of expressing column family
> scoping.
> St.Ack
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB