Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> sanity checking application WALogs make sense


Copy link to this message
-
Re: sanity checking application WALogs make sense
I'm a bit confused as to what you mean "if an iterator goes down
mid-processing." If it goes down at all, then whatever scope it's running
in- minor compaction, major compaction and scan- will most likely go down
as well (unless your iterator eats an exception and ignores errors). A
WALog shouldn't be deleted if whatever you were trying to do failed.

On Sat, Sep 15, 2012 at 1:44 AM, Sukant Hajra <[EMAIL PROTECTED]>wrote:

> Hi guys,
>
> We've been slowing inching towards using iterators more effectively.  The
> typical use case of indexed docs fit one of our needs and we wrote a
> prototype
> for it.
>
> We've recently realized that iterators are not just read-only, and that we
> can
> get more data-local functionality by taking advantage of their ability to
> mutate data as well.  We've only begun to think more of how this may
> assist us.
> A /lot/ of our critical data-accesses are slightly complex, but local to
> one
> row.  We have billions of entities in our system, so a simple bijection of
> entities to rows works our really well for us with respect to iterators.
>
> Up to this point, we've had an planned architecture that uses Kestrel for
> WALog
> and a messaging system like Akka pipelining work.  Akka would help us
> manage
> flowing work from the user to the log and from the log to orchestrations of
> Accumulo intra-row reads and writes.  The log just helps us get some faster
> response time without sacrificing too much reliability.
>
> Recently someone asked why use our own WALog when Accumulo has one
> natively in
> HDFS.  My response has been that Accumulo's WALog is at a lower level of
> granularity of mutations.  We want reliable orchestrations of mutations.
>  Our
> orchestrations are idempotent, but we want something long the lines of
> at-least-once delivery for the entire orchestration.  If an iterator goes
> down
> mid-processing, I fear Accumulo's native WALog is insufficient to claim we
> have
> a reliable enough system.
>
> I could definitely go through source code to validate this opinion, but I
> thought I'd bounce this reasoning off the list first.
>
> Also, I'm sure we're not the only people using Accumulo in this way.
>  Please
> feel to advise us if anyone's got other ideas for an architecture or feels
> we're thinking about the problem backwards.
>
> Thanks for your input,
> Sukant
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB