Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> filter on value ranges


Copy link to this message
-
Re: filter on value ranges
If I may make one more argument for not using a filter and using a separate table as a secondary index instead, keep in mind you'll have to scan over the entire table to perform this query, since the rows containing the values you're after may appear anywhere in the table, i.e. all your queries will take a long time.

For most users of Accumulo time is more precious than storage space (which keeps getting cheaper and more plentiful, unlike time), so creating a secondary index is the path usually chosen over full table scans.
On Mar 9, 2012, at 2:48 PM, Keith Turner wrote:

> The WholeRowIterator can filter rows, just override it and implement
> the filter function.
>
> Also new in 1.4 is org.apache.accumulo.core.iterators.user.RowFilter.
> If provides similar functionality, but does not require reading the
> entire row into memory.
>
> Keith
>
> On Fri, Mar 9, 2012 at 1:11 PM, Kini, Ameet M. <[EMAIL PROTECTED]> wrote:
>>
>>
>>
>>
>> Thanks for the comments.
>>
>>
>>
>> I’m ok with rolling my own iterator/filter but not sure how to go about
>> doing it (see next para), so it’d be great to get pointers on it.  I’d
>> prefer keeping the schema to how it is today where each employee is
>> represented by a row in the table with a properties cf containing name and
>> salary cq. Here’s how it looks today
>>
>>
>>
>> rowID colfam     colqual         value
>>
>>
>>
>> abc  properties name            john
>>
>> abc  properties salary          10000
>>
>> def  properties name            alice
>>
>> def  properties salary          20000
>>
>>
>>
>> Part of my confusion lies in not knowing how to implement this range filter
>> class, because my query needs to get both the name as well as salary based
>> on a particular salary. What I would like to do is something like a Filter
>> equivalent to WholeRowIterator, say WholeRowFilter whose accept(Key k, Value
>> v) was provided the entire row in the Value argument alongwith appropriate
>> encodeRow/decodeRow as in WholeRowIterator. If the accept method returns
>> true, the whole row is returned to the client. Then I could extend this
>> class by writing a MyRangeFilter which would look inside the row and make
>> row level accept/reject decisions based on values of particular cq.
>>
>>
>>
>> Maybe this WholeRowFilter is already there in some form?
>>
>>
>>
>> -Ameet Kini
>>
>>
>>
>> From: Aaron Cordova [mailto:[EMAIL PROTECTED]]
>> Sent: Friday, March 09, 2012 9:20 AM
>> To: [EMAIL PROTECTED]
>> Subject: Re: filter on value ranges
>>
>>
>>
>> To answer your question, I would not use built-in iterators for this.
>>
>>
>>
>> But if you were determined, you could use what is known as 'document
>> sharding' as opposed to 'term sharding' and use an intersecting iterator.
>>
>>
>>
>> Instructions on how to do this should be added to the manual ...
>>
>>
>>
>>
>>
>> On Mar 9, 2012, at 9:07 AM, Kini, Ameet M. wrote:
>>
>>
>>
>>
>>
>> In 1.4, is there a way to use built-in iterators to run the following query
>> :
>>
>>   “get the name and salary of all employees where the salary is between X
>> and Y”
>>
>>
>>
>> Assuming a straightforward schema where name and salary are both cq.
>>
>>
>>
>> I’d like both the cq restriction and the range predicate applied on the
>> tservers.
>>
>>
>>
>> I see that Scanner.setColumnQualifierRegex would take care of the cq
>> restriction. But I don’t know of a built-in iterator for the range predicate
>> and I don’t know of how to compose those two iterators.
>>
>>
>>
>> Thanks,
>>
>> -Ameet Kini
>>
>>
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB