Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Scan.addFamiliy reduces results


Copy link to this message
-
Re: Scan.addFamiliy reduces results
I have the same confusion. Say if I added three column families A, B anc C
to the scan, now if a row has data for column family B and C but no data
for A, then it won't be returned  in the next() method?
What if the requirement is to get row data regardless of whether there's
data for a specific column family or not?

On Thu, Mar 15, 2012 at 1:04 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Hi Peter,
> for HBase you have keep in mind that it is a sparse columnar (or KeyValue)
> store: (rowkey, columnfamily, column, TS) -> value
>
> A scan only returns those KeyValues that match the scan. So when you set
> families on your scan you'll only get those rows for which the scan found
> any columns.
>
> Makes sense?
>
> -- Lars
>
>
>
> ________________________________
>  From: Peter Wolf <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Thursday, March 15, 2012 9:52 AM
> Subject: Re: Scan.addFamiliy reduces results
>
> Thanks Doug,
>
> I had read that, and I just read it again.  But I am missing something...
>
> Why does adding a family reduce the number of results?  Is there an
> implied filter of some form?  Does addFamily add some constraint on
> which rows are returned?
>
> Note that all my rows *ought* to have values in all the families.
>
> Thanks
> Peter
>
> On 3/15/12 12:39 PM, Doug Meil wrote:
> > re:  "However, I am getting different number of results, depending on
> > which families are added"
> >
> > Yes.
> >
> > I'd suggest you read this in the RefGuide.
> >
> > http://hbase.apache.org/book.html#datamodel
> >
> >
> >
> >
> >
> > On 3/15/12 12:08 PM, "Peter Wolf"<[EMAIL PROTECTED]>  wrote:
> >
> >> Hi all,
> >>
> >> I am doing a scan on a table with multiple families.  My code looks like
> >> this...
> >>
> >>          Scan scan = new Scan(calculateStartRowKey(a),
> >> calculateEndRowKey(b));
> >>
> >>          scan.setCaching(10000);
> >>          Filter filter = new SingleColumnValueFilter(xFamily, xColumn,
> >> CompareFilter.CompareOp.EQUAL, Bytes.toBytes(x));
> >>          scan.setFilter(filter);
> >>          scan
> >>                  .addFamily(xFamily)
> >>                  .addFamily(yFamily)
> >>                  .addFamily(zFamily);
> >>
> >>          ResultScanner scanner = hTable.getScanner(scan);
> >>
> >>          Iterator<Result>  it = scanner.iterator();
> >>          int resultCount = 0;
> >>          while (it.hasNext()) {
> >>                Result result = it.next();
> >>
> >>                resultCount++;
> >>          }
> >>
> >> However, I am getting different number of results, depending on which
> >> families are added.  For example these give different result counts
> >>
> >>          scan
> >>                  //.addFamily(xFamily)
> >>                  .addFamily(yFamily)
> >>                  .addFamily(zFamily);
> >> and
> >>          scan
> >>                  .addFamily(xFamily)
> >>                  .addFamily(yFamily)
> >>                  .addFamily(zFamily);
> >>
> >>
> >> There is no error message, and I don't see anything in the Scan
> >> documentation.  Does anyone know what is going on?
> >>
> >> Thanks
> >> Peter
> >>
> >>
> >>
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB