Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Iterators - updating other rows


Copy link to this message
-
Re: Iterators - updating other rows
On Mon, Jul 15, 2013 at 6:38 AM, Peter Tillotson <[EMAIL PROTECTED]>wrote:

> I've got two tables of dependent data, which I was hoping to update
> efficiently during compaction. This leads to the following requirements:
>   - Changes to other rows
>   - Changes in other tables
>
> I've fought with iterators and embedding writers, but have had to fall
> back to map reduce jobs to complete the update.
>
> Is there a recommended approach to this?
>

Writing to Accumulo from an iterator can lead to deadlock.  I can think of
at least the following two situations, but there are probably more.

Situation 1

 1. Memory is full on tablet server 1 and writes are held
 2. Tablet X is on Tserver 1 and is scheduled for compaction to free memory
 3. Tablet X tries to write to Tablet server 1, but the writes block
because memory is full (deadlock)
 4. No other tablet on Tserver 1 can be written to because memory is full
and can not be flushed,
     so the problem snowballs
Situation 2

 1. Tserver 2 is hosting Tablet Y & Z
 2. Tablet Y & Z have data in memory
 3. Tserver 2 dies
 4. Tserver 3 loads Tablet Y, recovers its data, and tries to compact
 5 Tablet Y tries to write to Tablet Z during compaction
 6. Tserver 4 loads Tablet Z, recovers its data, and tries to compact
 7 Tablet Z tries to write to Tablet Y during compaction
 8. Tablets Y & Z are not loaded yet, but trying to write each other
(deadlock)
 9. Tablet servers 2 and 3 can not load any more tablets, because their
load threads are both stuck.
     so the problem snowballs

I am currently working on an implementation of Percolator[1].  Not
something you can use now, but I am curious if you could use Percolator to
solve your problem?  I am very interested in feedback on this project while
its in its formative stages.  I hope to have it finished w/ Accumulo 1.6.0.

[1]: https://github.com/keith-turner/Accismus
> I bit more detail about the algorithm.
>
> I've two tables with different sort orders, and I use ngram row ids to
> group element and split over multiple tablets, so:
>
> Table1
> nm: key1: 000: newValueId2
> nm: key2: type: valueId1
> nm: key3: type: valueId1
>
> Table2
> ab: valueId1: 001: blob
> ab: valueId1:key2: nm
> ..
> ..
>
> Multiple keys point to the same value in the other table but both keys and
> values are liable to changes ... what I was trying to do was use special
> columns (column Qaulifier 000 above), I call them care-of to do redirects
> as data changes real-time, with iterators this would becomes eventually
> consistent and be very efficiently but a MapReduce approach requires
> multiple table scans of each large table. I like the approach because the
> ngram splits / groups data and the two different sorts give me different
> nice query characteristics.
>
> For some reason the embedded writers were blocking - I may retry with a
> larger cluster. I fought with it for a few days then resorted to MapReduce
> jobs until I get a chance to look at the Accumulo code more closely.
>
> Would it be easy to add a special iterator that accepts (Text, Mutation)
> pairs much as the AccumuloOutputFormat does ?
>
> Many thanks in advance
>
> Peter.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB