Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> [potential bug]Find rows which do not have any of the given columns


Copy link to this message
-
[potential bug]Find rows which do not have any of the given columns
- user
+dev

Hi Devs,

Please follow the discussion to get full context. tl:dr "Did a scan with
timerange and filters, scan o/p was incorrect. Repeated scan with filter
only, scan o/p was correct."

HBase version : 0.90.3
Hadoop : CDH3u0
Issues:
The scan when set with both a time range and a filter can behave in
an unintuitive way. Calling it unintuitive instead of wrong, since I do not
know if this is a known limitation of scan. Picture a filter setup like
mine - "Filter rows which have cells pertaining to certain columns". This
filter is set on a scan which has a time range constraint as well.  AFAIK
we skip Hfiles based on metadata when dealing with time ranges. If a region
has two Hfiles. One of the Hfiles has cells for unwanted columns but the
other one does not - we may get incorrect result based on what how time
range is set (If the time range scan optimizer skips the Hfile containing
unwanted cells).

Does this sound like a valid issue? Also I can see this happening to more
than one kind of SkipFilters.

-Shrijeet
On Mon, Aug 6, 2012 at 11:38 AM, Shrijeet Paliwal
<[EMAIL PROTECTED]>wrote:

> It seems setting time range is a problem , I was doing  (*
> scan.setTimeRange(Long.**valueOf(args[4]), Long.valueOf(args[5]));)*
> *
> *
> I was working on assumption that filter logic works before scan logic, in
> other words a KV dropped by filter will not make it to scan. In case of
> time range this might not be true.
>
> -Shrijeet
>
>
> On Mon, Aug 6, 2012 at 9:25 AM, jmozah <[EMAIL PROTECTED]> wrote:
>
>> Hmmm.. Missed it. Otherwise i dont spot anything wrong in this.
>> are you sure about the column names?
>>
>> ./zahoor
>>
>>
>> On 06-Aug-2012, at 9:34 PM, Shrijeet Paliwal <[EMAIL PROTECTED]>
>> wrote:
>>
>> > I am using FilterList. Could you elaborate?
>> >
>> > On Mon, Aug 6, 2012 at 8:48 AM, jmozah <[EMAIL PROTECTED]> wrote:
>> >
>> >>
>> >>
>> >> Use FilterList instead of List of Filters.
>> >>
>> >> ./Zahoor
>> >>
>> >> On 06-Aug-2012, at 12:12 PM, Shrijeet Paliwal <[EMAIL PROTECTED]
>> >
>> >> wrote:
>> >>
>> >>> Hi All,
>> >>>
>> >>> I am writing a job which finds rows that do not have a cell
>> corresponding
>> >>> to any of the columns in the given set of columns.
>> >>> This is how I have configured my scan (a combination of
>> lQualifierFilters
>> >>> and SkipFilter)
>> >>>
>> >>>   columnsSet = Splitter.on(',') .split(columns); //columns is a csv
>> >>> containing column names
>> >>>   List<Filter> qualifierFilters = new ArrayList<Filter>();
>> >>>   for (String qual : columnsSet) {
>> >>>     qualifierFilters.add(new QualifierFilter(CompareOp.NOT_EQUAL,
>> >>>         new BinaryComparator(Bytes.toBytes(qual))));
>> >>>   }
>> >>>   Filter skipFilter = new SkipFilter(new
>> >>> FilterList(Operator.MUST_PASS_ALL, qualifierFilters));
>> >>>   Scan scan = new Scan();
>> >>>   scan.addFamily(Bytes.toBytes(family));
>> >>>   scan.setCacheBlocks(false);
>> >>>   scan.setCaching(1000);
>> >>>   scan.setFilter(skipFilter);
>> >>>   scan.setTimeRange(Long.valueOf(args[4]), Long.valueOf(args[5]));
>> >>>
>> >>> In my test table the scan worked as expected. But in production run, I
>> >> got
>> >>> rows which had cells containing one of the given qualifiers (not
>> >> expected)
>> >>> Can some one help me spot the mistake?
>> >>>
>> >>> -Shrijeet
>> >>
>> >>
>>
>>
>
+
J Mohamed Zahoor 2012-08-07, 09:57
+
Shrijeet Paliwal 2012-08-07, 16:17
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB