Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Scan.addFamiliy reduces results


Copy link to this message
-
Re: Scan.addFamiliy reduces results
I have the same confusion. Say if I added three column families A, B anc C
to the scan, now if a row has data for column family B and C but no data
for A, then it won't be returned  in the next() method?
What if the requirement is to get row data regardless of whether there's
data for a specific column family or not?

On Thu, Mar 15, 2012 at 1:04 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Hi Peter,
> for HBase you have keep in mind that it is a sparse columnar (or KeyValue)
> store: (rowkey, columnfamily, column, TS) -> value
>
> A scan only returns those KeyValues that match the scan. So when you set
> families on your scan you'll only get those rows for which the scan found
> any columns.
>
> Makes sense?
>
> -- Lars
>
>
>
> ________________________________
>  From: Peter Wolf <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Thursday, March 15, 2012 9:52 AM
> Subject: Re: Scan.addFamiliy reduces results
>
> Thanks Doug,
>
> I had read that, and I just read it again.  But I am missing something...
>
> Why does adding a family reduce the number of results?  Is there an
> implied filter of some form?  Does addFamily add some constraint on
> which rows are returned?
>
> Note that all my rows *ought* to have values in all the families.
>
> Thanks
> Peter
>
> On 3/15/12 12:39 PM, Doug Meil wrote:
> > re:  "However, I am getting different number of results, depending on
> > which families are added"
> >
> > Yes.
> >
> > I'd suggest you read this in the RefGuide.
> >
> > http://hbase.apache.org/book.html#datamodel
> >
> >
> >
> >
> >
> > On 3/15/12 12:08 PM, "Peter Wolf"<[EMAIL PROTECTED]>  wrote:
> >
> >> Hi all,
> >>
> >> I am doing a scan on a table with multiple families.  My code looks like
> >> this...
> >>
> >>          Scan scan = new Scan(calculateStartRowKey(a),
> >> calculateEndRowKey(b));
> >>
> >>          scan.setCaching(10000);
> >>          Filter filter = new SingleColumnValueFilter(xFamily, xColumn,
> >> CompareFilter.CompareOp.EQUAL, Bytes.toBytes(x));
> >>          scan.setFilter(filter);
> >>          scan
> >>                  .addFamily(xFamily)
> >>                  .addFamily(yFamily)
> >>                  .addFamily(zFamily);
> >>
> >>          ResultScanner scanner = hTable.getScanner(scan);
> >>
> >>          Iterator<Result>  it = scanner.iterator();
> >>          int resultCount = 0;
> >>          while (it.hasNext()) {
> >>                Result result = it.next();
> >>
> >>                resultCount++;
> >>          }
> >>
> >> However, I am getting different number of results, depending on which
> >> families are added.  For example these give different result counts
> >>
> >>          scan
> >>                  //.addFamily(xFamily)
> >>                  .addFamily(yFamily)
> >>                  .addFamily(zFamily);
> >> and
> >>          scan
> >>                  .addFamily(xFamily)
> >>                  .addFamily(yFamily)
> >>                  .addFamily(zFamily);
> >>
> >>
> >> There is no error message, and I don't see anything in the Scan
> >> documentation.  Does anyone know what is going on?
> >>
> >> Thanks
> >> Peter
> >>
> >>
> >>
> >
>