Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - sanity checking application WALogs make sense


Copy link to this message
-
Re: sanity checking application WALogs make sense
Billie Rinaldi 2012-09-17, 19:01
On Sat, Sep 15, 2012 at 11:14 AM, Sukant Hajra <[EMAIL PROTECTED]>wrote:

> Excerpts from William Slacum's message of 2012-09-15 08:46:17 -0500:
> >
> > I'm a bit confused as to what you mean "if an iterator goes down
> > mid-processing." If it goes down at all, then whatever scope it's
> running in-
> > minor compaction, major compaction and scan- will most likely go down as
> well
> > (unless your iterator eats an exception and ignores errors). A WALog
> > shouldn't be deleted if whatever you were trying to do failed.
>
> I believe I've answered my own question after thinking about iterators
> more and
> looking at the code for some of the implementations.
>
> I was thinking about iterators "writing" changes to Accumulo using
> something
> like a BatchWriter.  Now I'm coming to the conclusion that even if that
> were
> possible, it is not how iterators were designed, and very likely bad for
> data
> integrity.  I don't feel that iterators should have any side-effects beyond
> scanning data through the source provided by the init() method.  In this
> way,
> I'm beginning to think about iterators more purely functionally.  Does that
> sound right?  Or have people come up with iterator implementations with
> more
> side-effects?
>

Your conclusion is correct, we did not really intend for iterators to read
or write outside of a single tablet.
>
> For instance, in one of my algorithms, authors might write conflicting
> data to
> a row that needs to be resolved.  I feel I could install iterators at scan,
> minor compaction, and major compaction to perform this resolution (which
> happens to be a very simple idempotent operation).
>
> Sorry if none of this sounds like a concrete question.  Some of what I'm
> looking for is conversation and validation in light of some limited local
> Accumulo expertise on my team.
>
> Has anyone thought about building up a small IRC community, say on
> #accumulo on
> Freenode?  There's a nice #hbase channel there, but at this point, I think
> I'm
> past the point of asking Bigtable-general questions.
>

We have recently started using #accumulo on freenode.  Feel free to join us
there!

Billie

>
> -Sukant
>