Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Find rows which do not have any of the given columns


Copy link to this message
-
Re: Find rows which do not have any of the given columns
It seems setting time range is a problem , I was doing  (*
scan.setTimeRange(Long.**valueOf(args[4]), Long.valueOf(args[5]));)*
*
*
I was working on assumption that filter logic works before scan logic, in
other words a KV dropped by filter will not make it to scan. In case of
time range this might not be true.

-Shrijeet
On Mon, Aug 6, 2012 at 9:25 AM, jmozah <[EMAIL PROTECTED]> wrote:

> Hmmm.. Missed it. Otherwise i dont spot anything wrong in this.
> are you sure about the column names?
>
> ./zahoor
>
>
> On 06-Aug-2012, at 9:34 PM, Shrijeet Paliwal <[EMAIL PROTECTED]>
> wrote:
>
> > I am using FilterList. Could you elaborate?
> >
> > On Mon, Aug 6, 2012 at 8:48 AM, jmozah <[EMAIL PROTECTED]> wrote:
> >
> >>
> >>
> >> Use FilterList instead of List of Filters.
> >>
> >> ./Zahoor
> >>
> >> On 06-Aug-2012, at 12:12 PM, Shrijeet Paliwal <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >>> Hi All,
> >>>
> >>> I am writing a job which finds rows that do not have a cell
> corresponding
> >>> to any of the columns in the given set of columns.
> >>> This is how I have configured my scan (a combination of
> lQualifierFilters
> >>> and SkipFilter)
> >>>
> >>>   columnsSet = Splitter.on(',') .split(columns); //columns is a csv
> >>> containing column names
> >>>   List<Filter> qualifierFilters = new ArrayList<Filter>();
> >>>   for (String qual : columnsSet) {
> >>>     qualifierFilters.add(new QualifierFilter(CompareOp.NOT_EQUAL,
> >>>         new BinaryComparator(Bytes.toBytes(qual))));
> >>>   }
> >>>   Filter skipFilter = new SkipFilter(new
> >>> FilterList(Operator.MUST_PASS_ALL, qualifierFilters));
> >>>   Scan scan = new Scan();
> >>>   scan.addFamily(Bytes.toBytes(family));
> >>>   scan.setCacheBlocks(false);
> >>>   scan.setCaching(1000);
> >>>   scan.setFilter(skipFilter);
> >>>   scan.setTimeRange(Long.valueOf(args[4]), Long.valueOf(args[5]));
> >>>
> >>> In my test table the scan worked as expected. But in production run, I
> >> got
> >>> rows which had cells containing one of the given qualifiers (not
> >> expected)
> >>> Can some one help me spot the mistake?
> >>>
> >>> -Shrijeet
> >>
> >>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB