|
|
-
Filtering rows by presence of keys
John Armstrong 2012-05-22, 16:02
Hi, everybody.
I'm looking around to see if this sort of functionality already exists. I've got a table holding objects that looks like
<UUID> <Type>:<Field> <Value>
I'd like to mark objects as "active" or "inactive" by adding keys like
<UUID> ACTIVE:---- ---- <UUID> INACTIVE:---- ----
and then set up an iterator to return the previous entries if and only if the row contains an ACTIVE column family.
I thought that at the meeting a couple weeks ago a pattern was described to return joins using an IntersectingIterator: set up one iterator to return the field value entries, another one to return the ACTIVE entries, and then return only the ones I want. But looking at IntersectingIterator itself, this doesn't match up with my mental picture.
So is there a known pattern matching this sort of thing? Any suggestions on crafting one?
-
Re: Filtering rows by presence of keys
Keith Turner 2012-05-22, 16:10
Take a look at the WholeRowIterator and RowFilter in org.apache.accumulo.core.iterators.user
Keith
On Tue, May 22, 2012 at 12:02 PM, John Armstrong <[EMAIL PROTECTED]> wrote: > Hi, everybody. > > I'm looking around to see if this sort of functionality already exists. > I've got a table holding objects that looks like > > <UUID> <Type>:<Field> <Value> > > I'd like to mark objects as "active" or "inactive" by adding keys like > > <UUID> ACTIVE:---- ---- > <UUID> INACTIVE:---- ---- > > and then set up an iterator to return the previous entries if and only if > the row contains an ACTIVE column family. > > I thought that at the meeting a couple weeks ago a pattern was described to > return joins using an IntersectingIterator: set up one iterator to return > the field value entries, another one to return the ACTIVE entries, and then > return only the ones I want. But looking at IntersectingIterator itself, > this doesn't match up with my mental picture. > > So is there a known pattern matching this sort of thing? Any suggestions on > crafting one?
-
Re: Filtering rows by presence of keys
John Armstrong 2012-05-22, 16:14
On 05/22/2012 12:10 PM, Keith Turner wrote: > Take a look at the WholeRowIterator and RowFilter in > org.apache.accumulo.core.iterators.user
Thanks, this looks promising. Under 1.3.4 I suppose I'd subclass WRI to implement the filter function?
-
RE: Filtering rows by presence of keys
Bob.Thorman@... 2012-05-22, 16:46
IntersectingIterator is designed to reduce a dataset to a common column qualifier for a collection of column families. So I presume you mental picture (like mine was for a long time) inverted to the logic of that iterator. You might try another type...like RowFilter.
-----Original Message----- From: John Armstrong [mailto:[EMAIL PROTECTED]] Sent: Tuesday, May 22, 2012 11:02 To: [EMAIL PROTECTED] Subject: Filtering rows by presence of keys
Hi, everybody.
I'm looking around to see if this sort of functionality already exists. I've got a table holding objects that looks like
<UUID> <Type>:<Field> <Value>
I'd like to mark objects as "active" or "inactive" by adding keys like
<UUID> ACTIVE:---- ---- <UUID> INACTIVE:---- ----
and then set up an iterator to return the previous entries if and only if the row contains an ACTIVE column family.
I thought that at the meeting a couple weeks ago a pattern was described to return joins using an IntersectingIterator: set up one iterator to return the field value entries, another one to return the ACTIVE entries, and then return only the ones I want. But looking at IntersectingIterator itself, this doesn't match up with my mental picture.
So is there a known pattern matching this sort of thing? Any suggestions on crafting one?
-
Re: Filtering rows by presence of keys
John Armstrong 2012-05-22, 16:55
On 05/22/2012 12:46 PM, [EMAIL PROTECTED] wrote: > IntersectingIterator is designed to reduce a dataset to a common column > qualifier for a collection of column families. So I presume you mental > picture (like mine was for a long time) inverted to the logic of that > iterator. You might try another type...like RowFilter.
Adding a filter to the WholeRowIterator has been suggested, and I'm trying that. I'm also pushing for an upgrade from 1.3.4 to 1.4.x, but that may be harder going.
-
RE: Filtering rows by presence of keys
Bob.Thorman@... 2012-05-22, 17:02
I just finished an upgrade from cloudbase 1.3.4 to Accumulo 1.4.0. You need to use the upgrade scripts in Accumulo 1.3.5 first, then use the same scripts in Accumulo 1.4.0. They work. The API has changed a bit and the map/reduce configuration is what I'm still working on. Hope it goes well for you...
-----Original Message----- From: John Armstrong [mailto:[EMAIL PROTECTED]] Sent: Tuesday, May 22, 2012 11:56 To: Thorman, Bob @ ISG - ComCept; [EMAIL PROTECTED] Subject: Re: Filtering rows by presence of keys
On 05/22/2012 12:46 PM, [EMAIL PROTECTED] wrote: > IntersectingIterator is designed to reduce a dataset to a common > column qualifier for a collection of column families. So I presume > you mental picture (like mine was for a long time) inverted to the > logic of that iterator. You might try another type...like RowFilter.
Adding a filter to the WholeRowIterator has been suggested, and I'm trying that. I'm also pushing for an upgrade from 1.3.4 to 1.4.x, but that may be harder going.
-
Re: Filtering rows by presence of keys
Adam Fuchs 2012-05-25, 13:34
One of the differences you'll see between WholeRowIterator and RowFilter is that WholeRowIterator buffers an entire row in memory while RowFilter does not. Each includes a boolean method that you would override in a subclass -- acceptRow(...) in RowFilter or filter(...) in WholeRowIterator. In this case, I think the acceptRow(...) method would be easier for you to implement, it might be more efficient, and you wouldn't have to worry about buffering too much in memory. Here's how I would write it:
public class AwesomeIterator extends RowFilter { ... public boolean acceptRow(SortedKeyValueIterator<Key,Value> rowIterator) throws IOException { // the seek will get "clipped" to the row in question, so we can use an infinite // range and look for anything in the "ACTIVE" column family rowIterator.seek(new Range(),Collections.singleton((ByteSequence)new ArrayByteSequence("ACTIVE")),true); return rowIterator.hasTop(); } } Cheers, Adam On Tue, May 22, 2012 at 12:56 PM, John Armstrong <[EMAIL PROTECTED]> wrote:
> On 05/22/2012 12:46 PM, [EMAIL PROTECTED] wrote: > >> IntersectingIterator is designed to reduce a dataset to a common column >> qualifier for a collection of column families. So I presume you mental >> picture (like mine was for a long time) inverted to the logic of that >> iterator. You might try another type...like RowFilter. >> > > Adding a filter to the WholeRowIterator has been suggested, and I'm trying > that. I'm also pushing for an upgrade from 1.3.4 to 1.4.x, but that may be > harder going. >
|
|