|
|
-
Scan.addFamiliy reduces results
Peter Wolf 2012-03-15, 16:08
Hi all,
I am doing a scan on a table with multiple families. My code looks like this...
Scan scan = new Scan(calculateStartRowKey(a), calculateEndRowKey(b));
scan.setCaching(10000); Filter filter = new SingleColumnValueFilter(xFamily, xColumn, CompareFilter.CompareOp.EQUAL, Bytes.toBytes(x)); scan.setFilter(filter); scan .addFamily(xFamily) .addFamily(yFamily) .addFamily(zFamily);
ResultScanner scanner = hTable.getScanner(scan);
Iterator<Result> it = scanner.iterator(); int resultCount = 0; while (it.hasNext()) { Result result = it.next();
resultCount++; }
However, I am getting different number of results, depending on which families are added. For example these give different result counts
scan //.addFamily(xFamily) .addFamily(yFamily) .addFamily(zFamily); and scan .addFamily(xFamily) .addFamily(yFamily) .addFamily(zFamily); There is no error message, and I don't see anything in the Scan documentation. Does anyone know what is going on?
Thanks Peter
+
Peter Wolf 2012-03-15, 16:08
-
Re: Scan.addFamiliy reduces results
Daniel Gómez Ferro 2012-03-15, 17:58
On Mar 15, 2012, at 17:08 , Peter Wolf wrote: > Hi all, > > I am doing a scan on a table with multiple families. My code looks like > this... > > Scan scan = new Scan(calculateStartRowKey(a), > calculateEndRowKey(b)); > > scan.setCaching(10000); > Filter filter = new SingleColumnValueFilter(xFamily, xColumn, > CompareFilter.CompareOp.EQUAL, Bytes.toBytes(x)); >From SingleColumnValueFilter documentation ( http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html ) When using this filter on a Scan with specified inputs, the column to be tested should also be added as input (otherwise the filter will regard the column as missing). I understand that you have to add xFamily, so your example below with the commented out addFamily(xFamily) would be wrong. > scan.setFilter(filter); > scan > .addFamily(xFamily) > .addFamily(yFamily) > .addFamily(zFamily); > > ResultScanner scanner = hTable.getScanner(scan); > > Iterator<Result> it = scanner.iterator(); > int resultCount = 0; > while (it.hasNext()) { > Result result = it.next(); > > resultCount++; > } > > However, I am getting different number of results, depending on which > families are added. For example these give different result counts > > scan > //.addFamily(xFamily) > .addFamily(yFamily) > .addFamily(zFamily); > and > scan > .addFamily(xFamily) > .addFamily(yFamily) > .addFamily(zFamily); > > > There is no error message, and I don't see anything in the Scan > documentation. Does anyone know what is going on? > > Thanks > Peter > >
+
Daniel Gómez Ferro 2012-03-15, 17:58
-
Re: Scan.addFamiliy reduces results
Doug Meil 2012-03-15, 16:39
re: "However, I am getting different number of results, depending on which families are added" Yes. I'd suggest you read this in the RefGuide. http://hbase.apache.org/book.html#datamodelOn 3/15/12 12:08 PM, "Peter Wolf" <[EMAIL PROTECTED]> wrote: >Hi all, > >I am doing a scan on a table with multiple families. My code looks like >this... > > Scan scan = new Scan(calculateStartRowKey(a), >calculateEndRowKey(b)); > > scan.setCaching(10000); > Filter filter = new SingleColumnValueFilter(xFamily, xColumn, >CompareFilter.CompareOp.EQUAL, Bytes.toBytes(x)); > scan.setFilter(filter); > scan > .addFamily(xFamily) > .addFamily(yFamily) > .addFamily(zFamily); > > ResultScanner scanner = hTable.getScanner(scan); > > Iterator<Result> it = scanner.iterator(); > int resultCount = 0; > while (it.hasNext()) { > Result result = it.next(); > > resultCount++; > } > >However, I am getting different number of results, depending on which >families are added. For example these give different result counts > > scan > //.addFamily(xFamily) > .addFamily(yFamily) > .addFamily(zFamily); >and > scan > .addFamily(xFamily) > .addFamily(yFamily) > .addFamily(zFamily); > > >There is no error message, and I don't see anything in the Scan >documentation. Does anyone know what is going on? > >Thanks >Peter > > >
+
Doug Meil 2012-03-15, 16:39
-
Re: Scan.addFamiliy reduces results
Peter Wolf 2012-03-15, 16:52
Thanks Doug, I had read that, and I just read it again. But I am missing something... Why does adding a family reduce the number of results? Is there an implied filter of some form? Does addFamily add some constraint on which rows are returned? Note that all my rows *ought* to have values in all the families. Thanks Peter On 3/15/12 12:39 PM, Doug Meil wrote: > re: "However, I am getting different number of results, depending on > which families are added" > > Yes. > > I'd suggest you read this in the RefGuide. > > http://hbase.apache.org/book.html#datamodel> > > > > > On 3/15/12 12:08 PM, "Peter Wolf"<[EMAIL PROTECTED]> wrote: > >> Hi all, >> >> I am doing a scan on a table with multiple families. My code looks like >> this... >> >> Scan scan = new Scan(calculateStartRowKey(a), >> calculateEndRowKey(b)); >> >> scan.setCaching(10000); >> Filter filter = new SingleColumnValueFilter(xFamily, xColumn, >> CompareFilter.CompareOp.EQUAL, Bytes.toBytes(x)); >> scan.setFilter(filter); >> scan >> .addFamily(xFamily) >> .addFamily(yFamily) >> .addFamily(zFamily); >> >> ResultScanner scanner = hTable.getScanner(scan); >> >> Iterator<Result> it = scanner.iterator(); >> int resultCount = 0; >> while (it.hasNext()) { >> Result result = it.next(); >> >> resultCount++; >> } >> >> However, I am getting different number of results, depending on which >> families are added. For example these give different result counts >> >> scan >> //.addFamily(xFamily) >> .addFamily(yFamily) >> .addFamily(zFamily); >> and >> scan >> .addFamily(xFamily) >> .addFamily(yFamily) >> .addFamily(zFamily); >> >> >> There is no error message, and I don't see anything in the Scan >> documentation. Does anyone know what is going on? >> >> Thanks >> Peter >> >> >> >
+
Peter Wolf 2012-03-15, 16:52
-
Re: Scan.addFamiliy reduces results
lars hofhansl 2012-03-15, 17:04
Hi Peter, for HBase you have keep in mind that it is a sparse columnar (or KeyValue) store: (rowkey, columnfamily, column, TS) -> value A scan only returns those KeyValues that match the scan. So when you set families on your scan you'll only get those rows for which the scan found any columns. Makes sense? -- Lars ________________________________ From: Peter Wolf <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Thursday, March 15, 2012 9:52 AM Subject: Re: Scan.addFamiliy reduces results Thanks Doug, I had read that, and I just read it again. But I am missing something... Why does adding a family reduce the number of results? Is there an implied filter of some form? Does addFamily add some constraint on which rows are returned? Note that all my rows *ought* to have values in all the families. Thanks Peter On 3/15/12 12:39 PM, Doug Meil wrote: > re: "However, I am getting different number of results, depending on > which families are added" > > Yes. > > I'd suggest you read this in the RefGuide. > > http://hbase.apache.org/book.html#datamodel> > > > > > On 3/15/12 12:08 PM, "Peter Wolf"<[EMAIL PROTECTED]> wrote: > >> Hi all, >> >> I am doing a scan on a table with multiple families. My code looks like >> this... >> >> Scan scan = new Scan(calculateStartRowKey(a), >> calculateEndRowKey(b)); >> >> scan.setCaching(10000); >> Filter filter = new SingleColumnValueFilter(xFamily, xColumn, >> CompareFilter.CompareOp.EQUAL, Bytes.toBytes(x)); >> scan.setFilter(filter); >> scan >> .addFamily(xFamily) >> .addFamily(yFamily) >> .addFamily(zFamily); >> >> ResultScanner scanner = hTable.getScanner(scan); >> >> Iterator<Result> it = scanner.iterator(); >> int resultCount = 0; >> while (it.hasNext()) { >> Result result = it.next(); >> >> resultCount++; >> } >> >> However, I am getting different number of results, depending on which >> families are added. For example these give different result counts >> >> scan >> //.addFamily(xFamily) >> .addFamily(yFamily) >> .addFamily(zFamily); >> and >> scan >> .addFamily(xFamily) >> .addFamily(yFamily) >> .addFamily(zFamily); >> >> >> There is no error message, and I don't see anything in the Scan >> documentation. Does anyone know what is going on? >> >> Thanks >> Peter >> >> >> >
+
lars hofhansl 2012-03-15, 17:04
-
Re: Scan.addFamiliy reduces results
Haijia Zhou 2012-03-15, 17:12
I have the same confusion. Say if I added three column families A, B anc C to the scan, now if a row has data for column family B and C but no data for A, then it won't be returned in the next() method? What if the requirement is to get row data regardless of whether there's data for a specific column family or not? On Thu, Mar 15, 2012 at 1:04 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > Hi Peter, > for HBase you have keep in mind that it is a sparse columnar (or KeyValue) > store: (rowkey, columnfamily, column, TS) -> value > > A scan only returns those KeyValues that match the scan. So when you set > families on your scan you'll only get those rows for which the scan found > any columns. > > Makes sense? > > -- Lars > > > > ________________________________ > From: Peter Wolf <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Thursday, March 15, 2012 9:52 AM > Subject: Re: Scan.addFamiliy reduces results > > Thanks Doug, > > I had read that, and I just read it again. But I am missing something... > > Why does adding a family reduce the number of results? Is there an > implied filter of some form? Does addFamily add some constraint on > which rows are returned? > > Note that all my rows *ought* to have values in all the families. > > Thanks > Peter > > On 3/15/12 12:39 PM, Doug Meil wrote: > > re: "However, I am getting different number of results, depending on > > which families are added" > > > > Yes. > > > > I'd suggest you read this in the RefGuide. > > > > http://hbase.apache.org/book.html#datamodel> > > > > > > > > > > > On 3/15/12 12:08 PM, "Peter Wolf"<[EMAIL PROTECTED]> wrote: > > > >> Hi all, > >> > >> I am doing a scan on a table with multiple families. My code looks like > >> this... > >> > >> Scan scan = new Scan(calculateStartRowKey(a), > >> calculateEndRowKey(b)); > >> > >> scan.setCaching(10000); > >> Filter filter = new SingleColumnValueFilter(xFamily, xColumn, > >> CompareFilter.CompareOp.EQUAL, Bytes.toBytes(x)); > >> scan.setFilter(filter); > >> scan > >> .addFamily(xFamily) > >> .addFamily(yFamily) > >> .addFamily(zFamily); > >> > >> ResultScanner scanner = hTable.getScanner(scan); > >> > >> Iterator<Result> it = scanner.iterator(); > >> int resultCount = 0; > >> while (it.hasNext()) { > >> Result result = it.next(); > >> > >> resultCount++; > >> } > >> > >> However, I am getting different number of results, depending on which > >> families are added. For example these give different result counts > >> > >> scan > >> //.addFamily(xFamily) > >> .addFamily(yFamily) > >> .addFamily(zFamily); > >> and > >> scan > >> .addFamily(xFamily) > >> .addFamily(yFamily) > >> .addFamily(zFamily); > >> > >> > >> There is no error message, and I don't see anything in the Scan > >> documentation. Does anyone know what is going on? > >> > >> Thanks > >> Peter > >> > >> > >> > > >
+
Haijia Zhou 2012-03-15, 17:12
-
Re: Scan.addFamiliy reduces results
lars hofhansl 2012-03-15, 17:17
Hi haijia, In that case HBase will still return the data for columns in family B and C.But if you only added family A then HBase would only return "rows" for which family A has any columns. -- Lars ________________________________ From: Haijia Zhou <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]> Sent: Thursday, March 15, 2012 10:12 AM Subject: Re: Scan.addFamiliy reduces results I have the same confusion. Say if I added three column families A, B anc C to the scan, now if a row has data for column family B and C but no data for A, then it won't be returned in the next() method? What if the requirement is to get row data regardless of whether there's data for a specific column family or not? On Thu, Mar 15, 2012 at 1:04 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: Hi Peter, >for HBase you have keep in mind that it is a sparse columnar (or KeyValue) store: (rowkey, columnfamily, column, TS) -> value > >A scan only returns those KeyValues that match the scan. So when you set families on your scan you'll only get those rows for which the scan found any columns. > >Makes sense? > >-- Lars > > > >________________________________ > From: Peter Wolf <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Sent: Thursday, March 15, 2012 9:52 AM >Subject: Re: Scan.addFamiliy reduces results > > >Thanks Doug, > >I had read that, and I just read it again. But I am missing something... > >Why does adding a family reduce the number of results? Is there an >implied filter of some form? Does addFamily add some constraint on >which rows are returned? > >Note that all my rows *ought* to have values in all the families. > >Thanks >Peter > >On 3/15/12 12:39 PM, Doug Meil wrote: >> re: "However, I am getting different number of results, depending on >> which families are added" >> >> Yes. >> >> I'd suggest you read this in the RefGuide. >> >> http://hbase.apache.org/book.html#datamodel>> >> >> >> >> >> On 3/15/12 12:08 PM, "Peter Wolf"<[EMAIL PROTECTED]> wrote: >> >>> Hi all, >>> >>> I am doing a scan on a table with multiple families. My code looks like >>> this... >>> >>> Scan scan = new Scan(calculateStartRowKey(a), >>> calculateEndRowKey(b)); >>> >>> scan.setCaching(10000); >>> Filter filter = new SingleColumnValueFilter(xFamily, xColumn, >>> CompareFilter.CompareOp.EQUAL, Bytes.toBytes(x)); >>> scan.setFilter(filter); >>> scan >>> .addFamily(xFamily) >>> .addFamily(yFamily) >>> .addFamily(zFamily); >>> >>> ResultScanner scanner = hTable.getScanner(scan); >>> >>> Iterator<Result> it = scanner.iterator(); >>> int resultCount = 0; >>> while (it.hasNext()) { >>> Result result = it.next(); >>> >>> resultCount++; >>> } >>> >>> However, I am getting different number of results, depending on which >>> families are added. For example these give different result counts >>> >>> scan >>> //.addFamily(xFamily) >>> .addFamily(yFamily) >>> .addFamily(zFamily); >>> and >>> scan >>> .addFamily(xFamily) >>> .addFamily(yFamily) >>> .addFamily(zFamily); >>> >>> >>> There is no error message, and I don't see anything in the Scan >>> documentation. Does anyone know what is going on? >>> >>> Thanks >>> Peter >>> >>> >>> >>
+
lars hofhansl 2012-03-15, 17:17
-
Re: Scan.addFamiliy reduces results
Peter Wolf 2012-03-15, 17:48
Hi Lars, still confused... My table *should* have values for families A, B and C. Let's say I have a bug, and some rows only have values for B and C. Let's also say there are 1000 rows with A,B,C and 500 rows with only B and C. If I add families A, B and C and scan with no filter will I get 1500, 1000 or 500 results? Many thanks P On 3/15/12 1:17 PM, lars hofhansl wrote: > Hi haijia, > > In that case HBase will still return the data for columns in family B and C.But if you only added family A then HBase would only return "rows" for which family A has any columns. > > -- Lars > ________________________________ > > From: Haijia Zhou<[EMAIL PROTECTED]> > To: [EMAIL PROTECTED]; lars hofhansl<[EMAIL PROTECTED]> > Sent: Thursday, March 15, 2012 10:12 AM > Subject: Re: Scan.addFamiliy reduces results > > > I have the same confusion. Say if I added three column families A, B anc C to the scan, now if a row has data for column family B and C but no data for A, then it won't be returned in the next() method? > What if the requirement is to get row data regardless of whether there's data for a specific column family or not? > > > On Thu, Mar 15, 2012 at 1:04 PM, lars hofhansl<[EMAIL PROTECTED]> wrote: > > Hi Peter, >> for HBase you have keep in mind that it is a sparse columnar (or KeyValue) store: (rowkey, columnfamily, column, TS) -> value >> >> A scan only returns those KeyValues that match the scan. So when you set families on your scan you'll only get those rows for which the scan found any columns. >> >> Makes sense? >> >> -- Lars >> >> >> >> ________________________________ >> From: Peter Wolf<[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Sent: Thursday, March 15, 2012 9:52 AM >> Subject: Re: Scan.addFamiliy reduces results >> >> >> Thanks Doug, >> >> I had read that, and I just read it again. But I am missing something... >> >> Why does adding a family reduce the number of results? Is there an >> implied filter of some form? Does addFamily add some constraint on >> which rows are returned? >> >> Note that all my rows *ought* to have values in all the families. >> >> Thanks >> Peter >> >> On 3/15/12 12:39 PM, Doug Meil wrote: >>> re: "However, I am getting different number of results, depending on >>> which families are added" >>> >>> Yes. >>> >>> I'd suggest you read this in the RefGuide. >>> >>> http://hbase.apache.org/book.html#datamodel>>> >>> >>> >>> >>> >>> On 3/15/12 12:08 PM, "Peter Wolf"<[EMAIL PROTECTED]> wrote: >>> >>>> Hi all, >>>> >>>> I am doing a scan on a table with multiple families. My code looks like >>>> this... >>>> >>>> Scan scan = new Scan(calculateStartRowKey(a), >>>> calculateEndRowKey(b)); >>>> >>>> scan.setCaching(10000); >>>> Filter filter = new SingleColumnValueFilter(xFamily, xColumn, >>>> CompareFilter.CompareOp.EQUAL, Bytes.toBytes(x)); >>>> scan.setFilter(filter); >>>> scan >>>> .addFamily(xFamily) >>>> .addFamily(yFamily) >>>> .addFamily(zFamily); >>>> >>>> ResultScanner scanner = hTable.getScanner(scan); >>>> >>>> Iterator<Result> it = scanner.iterator(); >>>> int resultCount = 0; >>>> while (it.hasNext()) { >>>> Result result = it.next(); >>>> >>>> resultCount++; >>>> } >>>> >>>> However, I am getting different number of results, depending on which >>>> families are added. For example these give different result counts >>>> >>>> scan >>>> //.addFamily(xFamily) >>>> .addFamily(yFamily) >>>> .addFamily(zFamily); >>>> and >>>> scan >>>> .addFamily(xFamily) >>>> .addFamily(yFamily) >>>> .addFamily(zFamily); >>>> >>>> >>>> There is no error message, and I don't see anything in the Scan >>>> documentation. Does anyone know what is going on?
+
Peter Wolf 2012-03-15, 17:48
-
Re: Scan.addFamiliy reduces results
Himanshu Vashishtha 2012-03-15, 18:42
" Let's also say there are 1000 rows with A,B,C and 500 rows with only B and C. If I add families A, B and C and scan with no filter will I get 1500, 1000 or 500 results?" In this case, you will get 1000 rows. In case you add only B, you will get 500 rows. It's not like if you add families A, B and C, it will give you _only_ those rows that have _all_ three families; rather it will give all rows that contain _any_ of these families. Hope this helps. Experts are welcome to chime in if I am missing something :) Thanks, Himanshu On Thu, Mar 15, 2012 at 11:48 AM, Peter Wolf <[EMAIL PROTECTED]> wrote: > Hi Lars, still confused... > > My table *should* have values for families A, B and C. Let's say I have a > bug, and some rows only have values for B and C. Let's also say there are > 1000 rows with A,B,C and 500 rows with only B and C. > > If I add families A, B and C and scan with no filter will I get 1500, 1000 > or 500 results? > > Many thanks > P > > > > > On 3/15/12 1:17 PM, lars hofhansl wrote: >> >> Hi haijia, >> >> In that case HBase will still return the data for columns in family B and >> C.But if you only added family A then HBase would only return "rows" for >> which family A has any columns. >> >> -- Lars >> ________________________________ >> >> From: Haijia Zhou<[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED]; lars hofhansl<[EMAIL PROTECTED]> >> Sent: Thursday, March 15, 2012 10:12 AM >> Subject: Re: Scan.addFamiliy reduces results >> >> >> I have the same confusion. Say if I added three column families A, B anc C >> to the scan, now if a row has data for column family B and C but no data for >> A, then it won't be returned in the next() method? >> What if the requirement is to get row data regardless of whether there's >> data for a specific column family or not? >> >> >> On Thu, Mar 15, 2012 at 1:04 PM, lars hofhansl<[EMAIL PROTECTED]> >> wrote: >> >> Hi Peter, >>> >>> for HBase you have keep in mind that it is a sparse columnar (or >>> KeyValue) store: (rowkey, columnfamily, column, TS) -> value >>> >>> A scan only returns those KeyValues that match the scan. So when you set >>> families on your scan you'll only get those rows for which the scan found >>> any columns. >>> >>> Makes sense? >>> >>> -- Lars >>> >>> >>> >>> ________________________________ >>> From: Peter Wolf<[EMAIL PROTECTED]> >>> To: [EMAIL PROTECTED] >>> Sent: Thursday, March 15, 2012 9:52 AM >>> Subject: Re: Scan.addFamiliy reduces results >>> >>> >>> Thanks Doug, >>> >>> I had read that, and I just read it again. But I am missing something... >>> >>> Why does adding a family reduce the number of results? Is there an >>> implied filter of some form? Does addFamily add some constraint on >>> which rows are returned? >>> >>> Note that all my rows *ought* to have values in all the families. >>> >>> Thanks >>> Peter >>> >>> On 3/15/12 12:39 PM, Doug Meil wrote: >>>> >>>> re: "However, I am getting different number of results, depending on >>>> which families are added" >>>> >>>> Yes. >>>> >>>> I'd suggest you read this in the RefGuide. >>>> >>>> http://hbase.apache.org/book.html#datamodel>>>> >>>> >>>> >>>> >>>> >>>> On 3/15/12 12:08 PM, "Peter Wolf"<[EMAIL PROTECTED]> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I am doing a scan on a table with multiple families. My code looks >>>>> like >>>>> this... >>>>> >>>>> Scan scan = new Scan(calculateStartRowKey(a), >>>>> calculateEndRowKey(b)); >>>>> >>>>> scan.setCaching(10000); >>>>> Filter filter = new SingleColumnValueFilter(xFamily, xColumn, >>>>> CompareFilter.CompareOp.EQUAL, Bytes.toBytes(x)); >>>>> scan.setFilter(filter); >>>>> scan >>>>> .addFamily(xFamily) >>>>> .addFamily(yFamily) >>>>> .addFamily(zFamily); >>>>> >>>>> ResultScanner scanner = hTable.getScanner(scan); >>>>> >>>>> Iterator<Result> it = scanner.iterator(); >>>>> int resultCount = 0;
+
Himanshu Vashishtha 2012-03-15, 18:42
-
Re: Scan.addFamiliy reduces results
Peter Wolf 2012-03-15, 19:05
Huh! That's what I was afraid you'd say. I'm still confused :-( If "it will give all rows that contain _any_ of these families", then why does adding a family give me *less* rows? Leaving my row start/stop and filtering code constant, and just un-commenting an addFamily() dramatically reduces the number of results returned from a scan. P On 3/15/12 2:42 PM, Himanshu Vashishtha wrote: > " Let's also say there are 1000 rows with A,B,C and 500 rows with only B and C. > > If I add families A, B and C and scan with no filter will I get 1500, > 1000 or 500 results?" > > In this case, you will get 1000 rows. In case you add only B, you will > get 500 rows. > > It's not like if you add families A, B and C, it will give you _only_ > those rows that have _all_ three families; rather it will give all > rows that contain _any_ of these families. > > Hope this helps. > > Experts are welcome to chime in if I am missing something :) > > Thanks, > Himanshu > > > On Thu, Mar 15, 2012 at 11:48 AM, Peter Wolf<[EMAIL PROTECTED]> wrote: >> Hi Lars, still confused... >> >> My table *should* have values for families A, B and C. Let's say I have a >> bug, and some rows only have values for B and C. Let's also say there are >> 1000 rows with A,B,C and 500 rows with only B and C. >> >> If I add families A, B and C and scan with no filter will I get 1500, 1000 >> or 500 results? >> >> Many thanks >> P >> >> >> >> >> On 3/15/12 1:17 PM, lars hofhansl wrote: >>> Hi haijia, >>> >>> In that case HBase will still return the data for columns in family B and >>> C.But if you only added family A then HBase would only return "rows" for >>> which family A has any columns. >>> >>> -- Lars >>> ________________________________ >>> >>> From: Haijia Zhou<[EMAIL PROTECTED]> >>> To: [EMAIL PROTECTED]; lars hofhansl<[EMAIL PROTECTED]> >>> Sent: Thursday, March 15, 2012 10:12 AM >>> Subject: Re: Scan.addFamiliy reduces results >>> >>> >>> I have the same confusion. Say if I added three column families A, B anc C >>> to the scan, now if a row has data for column family B and C but no data for >>> A, then it won't be returned in the next() method? >>> What if the requirement is to get row data regardless of whether there's >>> data for a specific column family or not? >>> >>> >>> On Thu, Mar 15, 2012 at 1:04 PM, lars hofhansl<[EMAIL PROTECTED]> >>> wrote: >>> >>> Hi Peter, >>>> for HBase you have keep in mind that it is a sparse columnar (or >>>> KeyValue) store: (rowkey, columnfamily, column, TS) -> value >>>> >>>> A scan only returns those KeyValues that match the scan. So when you set >>>> families on your scan you'll only get those rows for which the scan found >>>> any columns. >>>> >>>> Makes sense? >>>> >>>> -- Lars >>>> >>>> >>>> >>>> ________________________________ >>>> From: Peter Wolf<[EMAIL PROTECTED]> >>>> To: [EMAIL PROTECTED] >>>> Sent: Thursday, March 15, 2012 9:52 AM >>>> Subject: Re: Scan.addFamiliy reduces results >>>> >>>> >>>> Thanks Doug, >>>> >>>> I had read that, and I just read it again. But I am missing something... >>>> >>>> Why does adding a family reduce the number of results? Is there an >>>> implied filter of some form? Does addFamily add some constraint on >>>> which rows are returned? >>>> >>>> Note that all my rows *ought* to have values in all the families. >>>> >>>> Thanks >>>> Peter >>>> >>>> On 3/15/12 12:39 PM, Doug Meil wrote: >>>>> re: "However, I am getting different number of results, depending on >>>>> which families are added" >>>>> >>>>> Yes. >>>>> >>>>> I'd suggest you read this in the RefGuide. >>>>> >>>>> http://hbase.apache.org/book.html#datamodel>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 3/15/12 12:08 PM, "Peter Wolf"<[EMAIL PROTECTED]> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I am doing a scan on a table with multiple families. My code looks >>>>>> like >>>>>> this... >>>>>> >>>>>> Scan scan = new Scan(calculateStartRowKey(a), >>>>>> calculateEndRowKey(b)); >>>>>> >>>>>
+
Peter Wolf 2012-03-15, 19:05
-
Re: Scan.addFamiliy reduces results
Daniel Gómez Ferro 2012-03-15, 19:25
As I told you in the other message, if you don't addColumn() the column you are filtering on, by default it will return any row that doesn't contain the said column: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html#setFilterIfMissing(boolean)So when you uncomment the addColumn(), the filter kicks in and actually filters values. When the addColumn() is commented, all rows are returned. On Mar 15, 2012, at 20:05 , Peter Wolf wrote: > Huh! That's what I was afraid you'd say. I'm still confused :-( > > If "it will give all rows that contain _any_ of these families", then > why does adding a family give me *less* rows? > > Leaving my row start/stop and filtering code constant, and just > un-commenting an addFamily() dramatically reduces the number of results > returned from a scan. > > P > > > > On 3/15/12 2:42 PM, Himanshu Vashishtha wrote: >> " Let's also say there are 1000 rows with A,B,C and 500 rows with only B and C. >> >> If I add families A, B and C and scan with no filter will I get 1500, >> 1000 or 500 results?" >> >> In this case, you will get 1000 rows. In case you add only B, you will >> get 500 rows. >> >> It's not like if you add families A, B and C, it will give you _only_ >> those rows that have _all_ three families; rather it will give all >> rows that contain _any_ of these families. >> >> Hope this helps. >> >> Experts are welcome to chime in if I am missing something :) >> >> Thanks, >> Himanshu >> >> >> On Thu, Mar 15, 2012 at 11:48 AM, Peter Wolf<[EMAIL PROTECTED]> wrote: >>> Hi Lars, still confused... >>> >>> My table *should* have values for families A, B and C. Let's say I have a >>> bug, and some rows only have values for B and C. Let's also say there are >>> 1000 rows with A,B,C and 500 rows with only B and C. >>> >>> If I add families A, B and C and scan with no filter will I get 1500, 1000 >>> or 500 results? >>> >>> Many thanks >>> P >>> >>> >>> >>> >>> On 3/15/12 1:17 PM, lars hofhansl wrote: >>>> Hi haijia, >>>> >>>> In that case HBase will still return the data for columns in family B and >>>> C.But if you only added family A then HBase would only return "rows" for >>>> which family A has any columns. >>>> >>>> -- Lars >>>> ________________________________ >>>> >>>> From: Haijia Zhou<[EMAIL PROTECTED]> >>>> To: [EMAIL PROTECTED]; lars hofhansl<[EMAIL PROTECTED]> >>>> Sent: Thursday, March 15, 2012 10:12 AM >>>> Subject: Re: Scan.addFamiliy reduces results >>>> >>>> >>>> I have the same confusion. Say if I added three column families A, B anc C >>>> to the scan, now if a row has data for column family B and C but no data for >>>> A, then it won't be returned in the next() method? >>>> What if the requirement is to get row data regardless of whether there's >>>> data for a specific column family or not? >>>> >>>> >>>> On Thu, Mar 15, 2012 at 1:04 PM, lars hofhansl<[EMAIL PROTECTED]> >>>> wrote: >>>> >>>> Hi Peter, >>>>> for HBase you have keep in mind that it is a sparse columnar (or >>>>> KeyValue) store: (rowkey, columnfamily, column, TS) -> value >>>>> >>>>> A scan only returns those KeyValues that match the scan. So when you set >>>>> families on your scan you'll only get those rows for which the scan found >>>>> any columns. >>>>> >>>>> Makes sense? >>>>> >>>>> -- Lars >>>>> >>>>> >>>>> >>>>> ________________________________ >>>>> From: Peter Wolf<[EMAIL PROTECTED]> >>>>> To: [EMAIL PROTECTED] >>>>> Sent: Thursday, March 15, 2012 9:52 AM >>>>> Subject: Re: Scan.addFamiliy reduces results >>>>> >>>>> >>>>> Thanks Doug, >>>>> >>>>> I had read that, and I just read it again. But I am missing something... >>>>> >>>>> Why does adding a family reduce the number of results? Is there an >>>>> implied filter of some form? Does addFamily add some constraint on >>>>> which rows are returned? >>>>> >>>>> Note that all my rows *ought* to have values in all the families. >>>>> >>>>> Thanks >>>
+
Daniel Gómez Ferro 2012-03-15, 19:25
-
Re: Scan.addFamiliy reduces results -- Ah Ha!
Peter Wolf 2012-03-15, 19:41
No, no... I found it, and it is not the filter. Here is my Maven dependencies <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase</artifactId> <version>0.90.4-cdh3u3</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-core</artifactId> <version>0.20.2-cdh3u3</version> </dependency> <dependency> <groupId>org.apache.zookeeper</groupId> <artifactId>zookeeper</artifactId> <version>3.3.4-cdh3u3</version> </dependency> Here is my actual code (nothing too secret) Scan scan = new Scan(calculateStartRowKey(nativeLanguage, accountID), calculateEndRowKey(nativeLanguage, accountID)); scan .addFamily(metaFamily) .addFamily(request1Family) .addFamily(request2Family) .addFamily(serverFamily) .addFamily(queueFamily) .addFamily(scoreFamily) .addFamily(audioFamily) .addFamily(wordsFamily) ; Filter filter = new SingleColumnValueFilter(request1Family, request1AccountID, CompareFilter.CompareOp.EQUAL, Bytes.toBytes(accountID)); scan.setFilter(filter); scanSessions(processor, scan); ResultScanner scanner = null; try { scanner = hTable.getScanner(scan); try { Iterator<Result> it = scanner.iterator(); while (it.hasNext()) { try { Result result = it.next(); process(result); } catch (Throwable e) { System.out.println("WARNING: " + e); } } } finally { scanner.close(); } } catch (IOException e) { new RuntimeException(e); } Notice the two lines marked by --> If I uncomment the .addFamily(researchFamily) I get 4258 results, if I comment it out I get 24258. The difference is exactly 20000. Notice the setCaching(10000). If I change it to 1000, I get 24258 results There seems to be a connection between the caching value and the number of results returned. Furthermore, adding more data to the results by adding families reduces the number of results. Note that the data in my researchFamily is quite large. Did I find a bug? Peter On 3/15/12 3:25 PM, Daniel G�mez Ferro wrote: > As I told you in the other message, if you don't addColumn() the column you are filtering on, by default it will return any row that doesn't contain the said column: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html#setFilterIfMissing(boolean)> > So when you uncomment the addColumn(), the filter kicks in and actually filters values. When the addColumn() is commented, all rows are returned. > > On Mar 15, 2012, at 20:05 , Peter Wolf wrote: > >> Huh! That's what I was afraid you'd say. I'm still confused :-( >> >> If "it will give all rows that contain _any_ of these families", then >> why does adding a family give me *less* rows? >> >> Leaving my row start/stop and filtering code constant, and just >> un-commenting an addFamily() dramatically reduces the number of results >> returned from a scan. >> >> P >> >> >> >> On 3/15/12 2:42 PM, Himanshu Vashishtha wrote: >>> " Let's also say there are 1000 rows with A,B,C and 500 rows with only B and C. >>> >>> If I add families A, B and C and scan with no filter will I get 1500, >>> 1000 or 500 results?" >>> >>> In this case, you will get 1000 rows. In case you add only B, you will >>> get 500 rows. >>> >>> It's not like if you add families A, B and C, it will give you _only_ >>> those rows that have _all_ three families; rather it will give all >>> rows that contain _any_ of these families. >>> >>> Hope this helps. >>> >>> Experts are welcome to chime in if I am missing something :) >>> >>> Thanks, >>> Himanshu >>> >>> >>> On Thu, Mar 15, 2012 at 11:48 AM, Peter Wolf<[EMAIL PROTECTED]> wrote:
+
Peter Wolf 2012-03-15, 19:41
|
|