David Medinets 2012-12-28, 14:01
The AccumuloInputFormat can use any iterator, custom or packaged with
Accumulo, as long as its on the TabletServer's classpath.
I'm a little confused at what you actually want as input to your
MapReduce job. Do you want all keys where the CQ starts with XXX? Or, do
you want the entire "record" (123_123_1234_000 RECID=13) when such a
record exists that contains some value for the domain "XXX"?
As an aside, both cases would be rather inefficient as diagrammed as you
have to scan the entire table and filter records in the Mapper instead
of letting the TabletServer filter results for you. If the former case
is what you want, you could use the RegexFilter to prune results
server-side. If the latter is the case, you most likely have to write
your own iterator to get the desired functionality (or permute your key
structure so that it better falls into some built-in access paths such
Perhaps you could also build an index table that inverts row+colfam and
colqual if this is a common access pattern for you.
Also, be aware that if you have many columns in a row, the
WholeRowIterator has the potential to exceed the TabletServer's heap as
it aggregates all of the columns for that row together.
On 12/28/12 9:01 AM, David Medinets wrote:
> I have a schema that looks something like:
> ROW CF CQ
> 123_123_1234_000 RECID=13 XXX=BEEF
> 123_123_1234_000 RECID=13 YYY=BAR
> 999_123_1999_000 RECID=51 XXX=HAM
> 999_123_1999_000 RECID=51 FOO=BAR
> My goal is to find the domain values for the XXX 'field'. My
> map-reduce job succeeds at doing this using the standard iterators.
> I'm wondering if using the WholeRowIterator might be a better
> approach. Or perhaps there is another way (beyond a custom iterator)?