|
|
+
David Medinets 2012-12-28, 14:01
-
Re: Can WholeRowIterator be used with AccumuloInputFormat?Josh Elser 2012-12-28, 15:00
The AccumuloInputFormat can use any iterator, custom or packaged with
Accumulo, as long as its on the TabletServer's classpath. I'm a little confused at what you actually want as input to your MapReduce job. Do you want all keys where the CQ starts with XXX? Or, do you want the entire "record" (123_123_1234_000 RECID=13) when such a record exists that contains some value for the domain "XXX"? As an aside, both cases would be rather inefficient as diagrammed as you have to scan the entire table and filter records in the Mapper instead of letting the TabletServer filter results for you. If the former case is what you want, you could use the RegexFilter to prune results server-side. If the latter is the case, you most likely have to write your own iterator to get the desired functionality (or permute your key structure so that it better falls into some built-in access paths such as fetchColumn). Perhaps you could also build an index table that inverts row+colfam and colqual if this is a common access pattern for you. Also, be aware that if you have many columns in a row, the WholeRowIterator has the potential to exceed the TabletServer's heap as it aggregates all of the columns for that row together. On 12/28/12 9:01 AM, David Medinets wrote: > I have a schema that looks something like: > > ROW CF CQ > 123_123_1234_000 RECID=13 XXX=BEEF > 123_123_1234_000 RECID=13 YYY=BAR > 999_123_1999_000 RECID=51 XXX=HAM > 999_123_1999_000 RECID=51 FOO=BAR > > My goal is to find the domain values for the XXX 'field'. My > map-reduce job succeeds at doing this using the standard iterators. > I'm wondering if using the WholeRowIterator might be a better > approach. Or perhaps there is another way (beyond a custom iterator)? |