|
|
-
Scan addFamily vs FamilyFilter(EQUAL, ...)
Kevin 2012-05-30, 16:59
Hi,
I am curious and trying to learn which method is best when wanting to limit a scan to a particular column or column family. The Scan class carries a Filter instance and a TreeMap of the family map and I am unsure how they get carried through to the server-side functionality. In terms of performance is there any difference between doing Scan.addFamily(x) and Scan.setFilter(new FamilyFilter(CompareFilter.CompareOp.EQUAL, x)?
Thanks.
-
Re: Scan addFamily vs FamilyFilter(EQUAL, ...)
Stack 2012-05-30, 17:43
On Wed, May 30, 2012 at 9:59 AM, Kevin <[EMAIL PROTECTED]> wrote: > I am curious and trying to learn which method is best when wanting to limit > a scan to a particular column or column family. The Scan class carries a > Filter instance and a TreeMap of the family map and I am unsure how they > get carried through to the server-side functionality. In terms of > performance is there any difference between doing Scan.addFamily(x) and > Scan.setFilter(new FamilyFilter(CompareFilter.CompareOp.EQUAL, x)? >
There is probably not noticeable difference in performance but Scan#addFamily is the more natural way of expressing column family scoping. St.Ack
-
RE: Scan addFamily vs FamilyFilter(EQUAL, ...)
Buttler, David 2012-05-31, 00:38
One thing I ran into when using the Scan.addFamily / Scan.addColumn is that those two methods overwrite each other. So, if you do Scan.addFamily("a"), and the family contains qualifiers x, y, and z; and then do Scan.addColumn("a","x"), you will not get the columns y and z back. Similarly, if you do a Scan.addColumn("a","x"), and then a Scan.addFamily("a"), you will get the columns x, y, and z back.
Dave
-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Stack Sent: Wednesday, May 30, 2012 10:43 AM To: [EMAIL PROTECTED] Subject: Re: Scan addFamily vs FamilyFilter(EQUAL, ...)
On Wed, May 30, 2012 at 9:59 AM, Kevin <[EMAIL PROTECTED]> wrote: > I am curious and trying to learn which method is best when wanting to limit > a scan to a particular column or column family. The Scan class carries a > Filter instance and a TreeMap of the family map and I am unsure how they > get carried through to the server-side functionality. In terms of > performance is there any difference between doing Scan.addFamily(x) and > Scan.setFilter(new FamilyFilter(CompareFilter.CompareOp.EQUAL, x)? >
There is probably not noticeable difference in performance but Scan#addFamily is the more natural way of expressing column family scoping. St.Ack
-
Re: Scan addFamily vs FamilyFilter(EQUAL, ...)
Stack 2012-05-31, 04:29
On Wed, May 30, 2012 at 5:38 PM, Buttler, David <[EMAIL PROTECTED]> wrote: > One thing I ran into when using the Scan.addFamily / Scan.addColumn is that those two methods overwrite each other. So, if you do Scan.addFamily("a"), and the family contains qualifiers x, y, and z; and then do Scan.addColumn("a","x"), you will not get the columns y and z back. Similarly, if you do a Scan.addColumn("a","x"), and then a Scan.addFamily("a"), you will get the columns x, y, and z back. >
That seems a bit silly. File a bug David? We should fix that. St.Ack
-
RE: Scan addFamily vs FamilyFilter(EQUAL, ...)
Anoop Sam John 2012-05-31, 06:18
Hi, As per my understanding of the Scan code in your scenario where you want to go with scanning of some CFs ( not all) You go with Scan#addFamily. The FamilyFilter also doing the same thing. But there is a difference in the performance. When one specify the CFs in the scan, the scanner will be created for only those many Stores. For the other CFs, there wont be any scanners and so those stores are not scanned. ( The HFile data is not fetched ) Instead when one use the FamilyFilter and not specify any specific columns (using Scan#addFamily) all the stores will get scanned and data will get fetched from HFiles. Later these KVs corresponding to which you needed (as per your FamilyFilter) only will get included in the Result and others just avoided. So there will be performance difference I feel.. Correct me if I am wrong pls...
@Stack >One thing I ran into when using the Scan.addFamily / Scan.addColumn is that those two methods overwrite each other. In the Scan#addColumn javadoc it is clearly telling about this overwrites... So this seems intentionally done correct? -Anoop- ________________________________________ From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] on behalf of Stack [[EMAIL PROTECTED]] Sent: Wednesday, May 30, 2012 11:13 PM To: [EMAIL PROTECTED] Subject: Re: Scan addFamily vs FamilyFilter(EQUAL, ...)
On Wed, May 30, 2012 at 9:59 AM, Kevin <[EMAIL PROTECTED]> wrote: > I am curious and trying to learn which method is best when wanting to limit > a scan to a particular column or column family. The Scan class carries a > Filter instance and a TreeMap of the family map and I am unsure how they > get carried through to the server-side functionality. In terms of > performance is there any difference between doing Scan.addFamily(x) and > Scan.setFilter(new FamilyFilter(CompareFilter.CompareOp.EQUAL, x)? >
There is probably not noticeable difference in performance but Scan#addFamily is the more natural way of expressing column family scoping. St.Ack
-
Re: Scan addFamily vs FamilyFilter(EQUAL, ...)
Stack 2012-05-31, 06:53
On Wed, May 30, 2012 at 11:18 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote: > @Stack >>One thing I ran into when using the Scan.addFamily / Scan.addColumn is that those two methods overwrite each other. > In the Scan#addColumn javadoc it is clearly telling about this overwrites... So this seems intentionally done correct? >
Ok. Then operating as advertised. Seems like a simple issue to trip on though.
Thanks Anoop,
St.Ack
-
RE: Scan addFamily vs FamilyFilter(EQUAL, ...)
Ramkrishna.S.Vasudevan 2012-05-31, 11:38
Just to add on. The java doc clearly says in FamilyFilter that
* If an already known column family is looked for, use {@link org.apache.hadoop.hbase.client.Get#addFamily(byte[])} * directly rather than a filter.
So addFamily should be better.
Regards Ram
> -----Original Message----- > From: Anoop Sam John [mailto:[EMAIL PROTECTED]] > Sent: Thursday, May 31, 2012 11:49 AM > To: [EMAIL PROTECTED] > Subject: RE: Scan addFamily vs FamilyFilter(EQUAL, ...) > > Hi, > As per my understanding of the Scan code in your scenario where > you want to go with scanning of some CFs ( not all) You go with > Scan#addFamily. > The FamilyFilter also doing the same thing. But there is a difference > in the performance. > When one specify the CFs in the scan, the scanner will be created for > only those many Stores. For the other CFs, there wont be any scanners > and so those stores are not scanned. ( The HFile data is not fetched ) > Instead when one use the FamilyFilter and not specify any specific > columns (using Scan#addFamily) all the stores will get scanned and data > will get fetched from HFiles. Later these KVs corresponding to which > you needed (as per your FamilyFilter) only will get included in the > Result and others just avoided. So there will be performance > difference I feel.. Correct me if I am wrong pls... > > @Stack > >One thing I ran into when using the Scan.addFamily / Scan.addColumn is > that those two methods overwrite each other. > In the Scan#addColumn javadoc it is clearly telling about this > overwrites... So this seems intentionally done correct? > > > -Anoop- > ________________________________________ > From: [EMAIL PROTECTED] [[EMAIL PROTECTED]] on behalf of Stack > [[EMAIL PROTECTED]] > Sent: Wednesday, May 30, 2012 11:13 PM > To: [EMAIL PROTECTED] > Subject: Re: Scan addFamily vs FamilyFilter(EQUAL, ...) > > On Wed, May 30, 2012 at 9:59 AM, Kevin <[EMAIL PROTECTED]> > wrote: > > I am curious and trying to learn which method is best when wanting to > limit > > a scan to a particular column or column family. The Scan class > carries a > > Filter instance and a TreeMap of the family map and I am unsure how > they > > get carried through to the server-side functionality. In terms of > > performance is there any difference between doing Scan.addFamily(x) > and > > Scan.setFilter(new FamilyFilter(CompareFilter.CompareOp.EQUAL, x)? > > > > There is probably not noticeable difference in performance but > Scan#addFamily is the more natural way of expressing column family > scoping. > St.Ack
|
|