Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - Intersecting Iterators [SEC=UNCLASSIFIED]


Copy link to this message
-
Intersecting Iterators [SEC=UNCLASSIFIED]
Williamson, Luke MR 1 2013-08-14, 01:58
UNCLASSIFIED

Hi,
 
I have field indexes that looks something like
 
Row Id: <date>-<UUID>
CF: fi||<type>||<value>
CQ: <date>-<UUID>
 
For example:

20130814-550e8400-e29b-41d4-a716-446655440000 fi||verb||run 20130814-550e8400-e29b-41d4-a716-446655440000
20130814-550e8400-e29b-41d4-a716-446655440000 page||58 line||16 "the boy can run up the hill"

>From what I could determine from the doco and API I am executing the following code to perform an intersecting query on two values...

Set<Range> shards = new HashSet<Range>();

Text[] terms = {new Text("fi||<type>||<value>"), new Text("fi||<type>||<value>")};

BatchScanner bs = conn.createBatchScanner(table, auths, 20); bs.setTimeout(360, TimeUnit.SECONDS);

IteratorSetting iter = new IteratorSetting(20, "ii", IntersectingIterator.class); IntersectingIterator.setColumnFamilies(iter, terms); bs.addScanIterator(iter);

bs.setRanges(Collections.singleton(new Range()));

for(Entry<Key,Value> entry : bs) {

    shards.add(new Range(entry.getKey().getColumnQualifier()));
}

I then perform a second batch scan using the set of ranges returned by the above to get my actual results.

My issues is that the intersecting query takes several minutes to return if at all (in some cases it times out). Is this expected? Is there some way to improve performance? Is there a better way to do this sort of query?

Any guidance would be much appreciated.

Thanks

Luke
IMPORTANT: This email remains the property of the Department of Defence and is subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in error, you are requested to contact the sender and delete the email.