Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> sanity checking application WALogs make sense


+
Sukant Hajra 2012-09-15, 05:44
+
William Slacum 2012-09-15, 13:46
+
Sukant Hajra 2012-09-15, 18:14
Copy link to this message
-
Re: sanity checking application WALogs make sense
On Sat, Sep 15, 2012 at 11:14 AM, Sukant Hajra <[EMAIL PROTECTED]>wrote:

> Excerpts from William Slacum's message of 2012-09-15 08:46:17 -0500:
> >
> > I'm a bit confused as to what you mean "if an iterator goes down
> > mid-processing." If it goes down at all, then whatever scope it's
> running in-
> > minor compaction, major compaction and scan- will most likely go down as
> well
> > (unless your iterator eats an exception and ignores errors). A WALog
> > shouldn't be deleted if whatever you were trying to do failed.
>
> I believe I've answered my own question after thinking about iterators
> more and
> looking at the code for some of the implementations.
>
> I was thinking about iterators "writing" changes to Accumulo using
> something
> like a BatchWriter.  Now I'm coming to the conclusion that even if that
> were
> possible, it is not how iterators were designed, and very likely bad for
> data
> integrity.  I don't feel that iterators should have any side-effects beyond
> scanning data through the source provided by the init() method.  In this
> way,
> I'm beginning to think about iterators more purely functionally.  Does that
> sound right?  Or have people come up with iterator implementations with
> more
> side-effects?
>

Your conclusion is correct, we did not really intend for iterators to read
or write outside of a single tablet.
>
> For instance, in one of my algorithms, authors might write conflicting
> data to
> a row that needs to be resolved.  I feel I could install iterators at scan,
> minor compaction, and major compaction to perform this resolution (which
> happens to be a very simple idempotent operation).
>
> Sorry if none of this sounds like a concrete question.  Some of what I'm
> looking for is conversation and validation in light of some limited local
> Accumulo expertise on my team.
>
> Has anyone thought about building up a small IRC community, say on
> #accumulo on
> Freenode?  There's a nice #hbase channel there, but at this point, I think
> I'm
> past the point of asking Bigtable-general questions.
>

We have recently started using #accumulo on freenode.  Feel free to join us
there!

Billie

>
> -Sukant
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB