Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - MinCombiner and MaxCombiner priority issue [SEC=UNCLASSIFIED]


Copy link to this message
-
MinCombiner and MaxCombiner priority issue [SEC=UNCLASSIFIED]
Dickson, Matt MR 2013-02-11, 23:47
UNCLASSIFIED

Hi,

I'm reasonably new to using Accumulo so I apologise if some of my terminology is incorrect.

A bit of overview

We have an Accumulo table that ingests data in daily increments and ages off data in daily increments.  For each unique rowid we maintain a daily max and min value and a count, using the MinCombiner, MaxCombiner and SummingCombiner.  When a user queries the table for a rowid, scan iterators are added to calculate the min, max and count across the entire table by adding up the daily summaries of min, max and count.

The timestamp is truncated to a days timestamp, eg 1111100000000 in the example below.  This approach allows us to age off a days worth of data without having to recalculate the summary data because it is calculated by the scan iterators.

The problem

The issue I have come across is when the scan iterators are added I get different results based on the priority of the minCombiner and maxCombiner.  The priority of the SummingCombiner seems unaffected when I change its priority. If the MinCombiner's priority is higher (smaller number) than the MaxCombiner the result is correct, but if I switch the priorities and give the MaxCombiner the higher priority the result is incorrect and the minCombiner is not run.
This looks like
----------------------------------------------------------------------------

Range range = new Range("harry", "harry~");

//Setup the MIN
IteratorSetting isTotalMin = new IteratorSetting ( 15, "Min Calc", MinCombiner.class");
MinCombiner.setColumns(isTotalMin, Collections.singleton(new Iterator.setting.Colomn("min")));
MinCombiner.setColumns (isTotalMin, MinCombiner.Type.STRING);

//Setup the MAX
IteratorSetting isTotalMax = new IteratorSetting ( 16, "Max Calc", MaxCombiner.class");
MaxCombiner.setColumns(isTotalMax, Collections.singleton(new Iterator.setting.Colomn("max")));
MaxCombiner.setColumns (isTotalMax, MaxCombiner.Type.STRING);

//Setup the MIN
IteratorSetting isTotalCount = new IteratorSetting ( 17, "Count Calc", SummingCombiner.class");
SummingCombiner.setColumns(isTotalCount, Collections.singleton(new Iterator.setting.Colomn("count")));
SumminCombiner.setColumns (isTotalCount, SummingCombiner.Type.STRING);

Scanner s = connector.createScanner(tableName, new Authorizations("L1", "L2");
s.addScanIterator(isTotalCount);
s.addScanIterator(isTotalMin);
s.addScanIterator(isTotalMax);
s.setRange(range);
s.fetchColumnFamily(new Text("count");
s.fetchColumnFamily(new Text("min");
s.fetchColumnFamily(new Text("max");
for (Entry<Key, Value> e : s) {
  System.out.println(e.getKey().getRow() + ", " + e.getKey().getColumnFamily() + ", " + e.getKey().getColumnQualifier() + ", VALUE: " + e.getValue());
}

--------------------------------------------------------------

If I run the above I get:

harry, count, 1111100000000, VALUE: 4
harry, max, 1111100000000, VALUE: 12500
harry, min, 1111100000000, VALUE: 999

This is correct.

However if I alter the priority of the MaxCombiner to be 14 and leave the MinCombiner at 15 I get:

harry, count, 1111100000000, VALUE: 4
harry, max, 1111100000000, VALUE: 12500

I lose the min value altogether.  I have tested altering the priority of the SummingCombiner but it doesn't seem to have any effect.

This may be due to the way I have setup the iterators or could be an Accumulo bug.

Keen to hear any thoughts.

Thanks in advance,
Matt

IMPORTANT: This email remains the property of the Department of Defence and is subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in error, you are requested to contact the sender and delete the email.