Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Inconsistent Naming in IteratorSetting class.


Copy link to this message
-
Re: Inconsistent Naming in IteratorSetting class.
Sorry, my bad. Sent you in the wrong direction.

To answer your questions I caused you to ask: getOptions() in the
SetIterCommand class is defined in the abstract
org.apache.accumulo.util.shell.Command. It's intended that the concrete
class overrides the getOptions() method to list the actual options for
that Command class.

As for the ambiguity in method names, in IteratorSetting, setProperties
ends up calling addOptions (which calls addOption). Not sure if there is
any historical significance in the multiple method names doing the same
thing. Someone else would have to confirm/deny, but I don't see any
reason to have both versions.

Moving on, I guess I'm confused by "each kind of Iterator". Are you
referring to the SortedKeyValueIterator interface as opposed to the
(deprecated) Aggregator interface and/or Combiner class? Are you just
referring to the "time" (minc, majc, scan) the class would be
instantiated/run?

In the example you're making, the Column option will be set on the table
(from your tableName variable). Then, when the AgeCombiner is
instantiated (for whatever time you configured it for: again, majc,
minc, or scan time), the options will be passed into the init method of
your AgeCombiner via the Map<String, String> argument. Take a look at
the init() method in the abstract Combiner class. You'll see it has
references to the key you used to set the "age" column to be combined.

To be super clear, from the Wikipedia example:
Set on my "wikiIndex" table:

table    | table.iterator.majc.UIDAggregator .................. |
19,org.apache.accumulo.examples.wikisearch.iterator.GlobalIndexUidCombiner
table    | table.iterator.majc.UIDAggregator.opt.all .......... | true
table    | table.iterator.minc.UIDAggregator .................. |
19,org.apache.accumulo.examples.wikisearch.iterator.GlobalIndexUidCombiner
table    | table.iterator.minc.UIDAggregator.opt.all .......... | true
table    | table.iterator.scan.UIDAggregator .................. |
19,org.apache.accumulo.examples.wikisearch.iterator.GlobalIndexUidCombiner
table    | table.iterator.scan.UIDAggregator.opt.all .......... | true

The UIDAggregator will be run at all three "times" and applied over all
columns.
To directly answer your final question, there is no "list" of all
possible properties  for Iterators/Combiners since it's completely
dependent on the Iterator/Combiner that was set. Perhaps you could make
the documentation on Combiners (docs/combiners.html) to be more explicit
about the properties defined there?

Also, let me know if something wasn't clear in that explanation :D

- Josh

On 03/18/2012 08:55 PM, David Medinets wrote:
> In the getOptions (another generically-named method!) in
> SetIterCommand, I see this code:
>
>      aggTypeOpt = new Option("agg", "aggregator", false, "an aggregating type");
>      regexTypeOpt = new Option("regex", "regular-expression", false, "a
> regex matching type");
>      versionTypeOpt = new Option("vers", "version", false, "a versioning type");
>      reqvisTypeOpt = new Option("reqvis", "require-visibility", false,
> "a type that omits entries with empty visibilities");
>      ageoffTypeOpt = new Option("ageoff", "ageoff", false, "an aging off type");
>
> It is not clear to me that the command-line option names (like 'agg')
> are the same values used in the IteratorSetting class. The
> IteratorSetting seems to hold generic map (which makes sense to
> provide flexibility).
>
> Let me elaborate via code:
>
> 1 IteratorSetting iteratorSetting = new IteratorSetting(1, AgeCombiner.class);
> 2 iteratorSetting.setName("ageCombiner");
> 3 Combiner.setColumns(iteratorSetting, Collections.singletonList(new
> IteratorSetting.Column("age")));
> 4 connector.tableOperations().attachIterator(tableName, iteratorSetting);
>
> Leaving aside the need to call a static class to see the column list,
> how do I set the iterator type?
>
> I want to create an example for each kind of iterator - in code (i.e.,
> not through the command line).