Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Iterators - updating other rows


Copy link to this message
-
Iterators - updating other rows
I've got two tables of dependent data, which I was hoping to update efficiently during compaction. This leads to the following requirements:
  - Changes to other rows
  - Changes in other tables

I've fought with iterators and embedding writers, but have had to fall back to map reduce jobs to complete the update. 

Is there a recommended approach to this?

I bit more detail about the algorithm. 

I've two tables with different sort orders, and I use ngram row ids to group element and split over multiple tablets, so:

Table1
nm: key1: 000: newValueId2
nm: key2: type: valueId1
nm: key3: type: valueId1

Table2
ab: valueId1: 001: blob
ab: valueId1:key2: nm
..
..
    
Multiple keys point to the same value in the other table but both keys and values are liable to changes ... what I was trying to do was use special columns (column Qaulifier 000 above), I call them care-of to do redirects as data changes real-time, with iterators this would becomes eventually consistent and be very efficiently but a MapReduce approach requires multiple table scans of each large table. I like the approach because the ngram splits / groups data and the two different sorts give me different nice query characteristics.

For some reason the embedded writers were blocking - I may retry with a larger cluster. I fought with it for a few days then resorted to MapReduce jobs until I get a chance to look at the Accumulo code more closely. 

Would it be easy to add a special iterator that accepts (Text, Mutation) pairs much as the AccumuloOutputFormat does ?  

Many thanks in advance

Peter.
+
Keith Turner 2013-07-15, 12:49
+
Peter Tillotson 2013-07-15, 13:31
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB