|
|
-
Find rows which do not have any of the given columns
Shrijeet Paliwal 2012-08-06, 06:42
Hi All,
I am writing a job which finds rows that do not have a cell corresponding to any of the columns in the given set of columns. This is how I have configured my scan (a combination of lQualifierFilters and SkipFilter)
columnsSet = Splitter.on(',') .split(columns); //columns is a csv containing column names List<Filter> qualifierFilters = new ArrayList<Filter>(); for (String qual : columnsSet) { qualifierFilters.add(new QualifierFilter(CompareOp.NOT_EQUAL, new BinaryComparator(Bytes.toBytes(qual)))); } Filter skipFilter = new SkipFilter(new FilterList(Operator.MUST_PASS_ALL, qualifierFilters)); Scan scan = new Scan(); scan.addFamily(Bytes.toBytes(family)); scan.setCacheBlocks(false); scan.setCaching(1000); scan.setFilter(skipFilter); scan.setTimeRange(Long.valueOf(args[4]), Long.valueOf(args[5]));
In my test table the scan worked as expected. But in production run, I got rows which had cells containing one of the given qualifiers (not expected) Can some one help me spot the mistake?
-Shrijeet
-
Re: Find rows which do not have any of the given columns
jmozah 2012-08-06, 15:48
Use FilterList instead of List of Filters.
./Zahoor
On 06-Aug-2012, at 12:12 PM, Shrijeet Paliwal <[EMAIL PROTECTED]> wrote:
> Hi All, > > I am writing a job which finds rows that do not have a cell corresponding > to any of the columns in the given set of columns. > This is how I have configured my scan (a combination of lQualifierFilters > and SkipFilter) > > columnsSet = Splitter.on(',') .split(columns); //columns is a csv > containing column names > List<Filter> qualifierFilters = new ArrayList<Filter>(); > for (String qual : columnsSet) { > qualifierFilters.add(new QualifierFilter(CompareOp.NOT_EQUAL, > new BinaryComparator(Bytes.toBytes(qual)))); > } > Filter skipFilter = new SkipFilter(new > FilterList(Operator.MUST_PASS_ALL, qualifierFilters)); > Scan scan = new Scan(); > scan.addFamily(Bytes.toBytes(family)); > scan.setCacheBlocks(false); > scan.setCaching(1000); > scan.setFilter(skipFilter); > scan.setTimeRange(Long.valueOf(args[4]), Long.valueOf(args[5])); > > In my test table the scan worked as expected. But in production run, I got > rows which had cells containing one of the given qualifiers (not expected) > Can some one help me spot the mistake? > > -Shrijeet
-
Re: Find rows which do not have any of the given columns
Shrijeet Paliwal 2012-08-06, 16:04
I am using FilterList. Could you elaborate?
On Mon, Aug 6, 2012 at 8:48 AM, jmozah <[EMAIL PROTECTED]> wrote:
> > > Use FilterList instead of List of Filters. > > ./Zahoor > > On 06-Aug-2012, at 12:12 PM, Shrijeet Paliwal <[EMAIL PROTECTED]> > wrote: > > > Hi All, > > > > I am writing a job which finds rows that do not have a cell corresponding > > to any of the columns in the given set of columns. > > This is how I have configured my scan (a combination of lQualifierFilters > > and SkipFilter) > > > > columnsSet = Splitter.on(',') .split(columns); //columns is a csv > > containing column names > > List<Filter> qualifierFilters = new ArrayList<Filter>(); > > for (String qual : columnsSet) { > > qualifierFilters.add(new QualifierFilter(CompareOp.NOT_EQUAL, > > new BinaryComparator(Bytes.toBytes(qual)))); > > } > > Filter skipFilter = new SkipFilter(new > > FilterList(Operator.MUST_PASS_ALL, qualifierFilters)); > > Scan scan = new Scan(); > > scan.addFamily(Bytes.toBytes(family)); > > scan.setCacheBlocks(false); > > scan.setCaching(1000); > > scan.setFilter(skipFilter); > > scan.setTimeRange(Long.valueOf(args[4]), Long.valueOf(args[5])); > > > > In my test table the scan worked as expected. But in production run, I > got > > rows which had cells containing one of the given qualifiers (not > expected) > > Can some one help me spot the mistake? > > > > -Shrijeet > >
-
Re: Find rows which do not have any of the given columns
jmozah 2012-08-06, 16:25
Hmmm.. Missed it. Otherwise i dont spot anything wrong in this. are you sure about the column names?
./zahoor On 06-Aug-2012, at 9:34 PM, Shrijeet Paliwal <[EMAIL PROTECTED]> wrote:
> I am using FilterList. Could you elaborate? > > On Mon, Aug 6, 2012 at 8:48 AM, jmozah <[EMAIL PROTECTED]> wrote: > >> >> >> Use FilterList instead of List of Filters. >> >> ./Zahoor >> >> On 06-Aug-2012, at 12:12 PM, Shrijeet Paliwal <[EMAIL PROTECTED]> >> wrote: >> >>> Hi All, >>> >>> I am writing a job which finds rows that do not have a cell corresponding >>> to any of the columns in the given set of columns. >>> This is how I have configured my scan (a combination of lQualifierFilters >>> and SkipFilter) >>> >>> columnsSet = Splitter.on(',') .split(columns); //columns is a csv >>> containing column names >>> List<Filter> qualifierFilters = new ArrayList<Filter>(); >>> for (String qual : columnsSet) { >>> qualifierFilters.add(new QualifierFilter(CompareOp.NOT_EQUAL, >>> new BinaryComparator(Bytes.toBytes(qual)))); >>> } >>> Filter skipFilter = new SkipFilter(new >>> FilterList(Operator.MUST_PASS_ALL, qualifierFilters)); >>> Scan scan = new Scan(); >>> scan.addFamily(Bytes.toBytes(family)); >>> scan.setCacheBlocks(false); >>> scan.setCaching(1000); >>> scan.setFilter(skipFilter); >>> scan.setTimeRange(Long.valueOf(args[4]), Long.valueOf(args[5])); >>> >>> In my test table the scan worked as expected. But in production run, I >> got >>> rows which had cells containing one of the given qualifiers (not >> expected) >>> Can some one help me spot the mistake? >>> >>> -Shrijeet >> >>
-
Re: Find rows which do not have any of the given columns
Shrijeet Paliwal 2012-08-06, 18:38
It seems setting time range is a problem , I was doing (* scan.setTimeRange(Long.**valueOf(args[4]), Long.valueOf(args[5]));)* * * I was working on assumption that filter logic works before scan logic, in other words a KV dropped by filter will not make it to scan. In case of time range this might not be true.
-Shrijeet On Mon, Aug 6, 2012 at 9:25 AM, jmozah <[EMAIL PROTECTED]> wrote:
> Hmmm.. Missed it. Otherwise i dont spot anything wrong in this. > are you sure about the column names? > > ./zahoor > > > On 06-Aug-2012, at 9:34 PM, Shrijeet Paliwal <[EMAIL PROTECTED]> > wrote: > > > I am using FilterList. Could you elaborate? > > > > On Mon, Aug 6, 2012 at 8:48 AM, jmozah <[EMAIL PROTECTED]> wrote: > > > >> > >> > >> Use FilterList instead of List of Filters. > >> > >> ./Zahoor > >> > >> On 06-Aug-2012, at 12:12 PM, Shrijeet Paliwal <[EMAIL PROTECTED]> > >> wrote: > >> > >>> Hi All, > >>> > >>> I am writing a job which finds rows that do not have a cell > corresponding > >>> to any of the columns in the given set of columns. > >>> This is how I have configured my scan (a combination of > lQualifierFilters > >>> and SkipFilter) > >>> > >>> columnsSet = Splitter.on(',') .split(columns); //columns is a csv > >>> containing column names > >>> List<Filter> qualifierFilters = new ArrayList<Filter>(); > >>> for (String qual : columnsSet) { > >>> qualifierFilters.add(new QualifierFilter(CompareOp.NOT_EQUAL, > >>> new BinaryComparator(Bytes.toBytes(qual)))); > >>> } > >>> Filter skipFilter = new SkipFilter(new > >>> FilterList(Operator.MUST_PASS_ALL, qualifierFilters)); > >>> Scan scan = new Scan(); > >>> scan.addFamily(Bytes.toBytes(family)); > >>> scan.setCacheBlocks(false); > >>> scan.setCaching(1000); > >>> scan.setFilter(skipFilter); > >>> scan.setTimeRange(Long.valueOf(args[4]), Long.valueOf(args[5])); > >>> > >>> In my test table the scan worked as expected. But in production run, I > >> got > >>> rows which had cells containing one of the given qualifiers (not > >> expected) > >>> Can some one help me spot the mistake? > >>> > >>> -Shrijeet > >> > >> > >
|
|