Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> multi-table isolated batch scanner


Copy link to this message
-
Re: multi-table isolated batch scanner
Chris,

The desire for isolation stems from the desire to amortize some computation
over a number of results. Say it takes 5 seconds to compute an intersection
of a couple of sets within the iterators, and then streaming back the
results takes a minute or so. If I have to redo the 5 second computation
many times, as in to support the reconstruction of the iterator tree, then
that computation may start to dominate my query performance. Primarily,
this means I need to be able to continue a scan without having to rebuild
the iterators. Isolation in the scanner has that side effect. Proper
isolation would be a "nice-to-have", but I can deal with not having it.

Adam

On Mon, Apr 15, 2013 at 4:13 PM, Christopher <[EMAIL PROTECTED]> wrote:

> Adam-
>
> It seems like you're talking about two features at once:
> 1) Multi-table batch scanner.
> 2) Scan Isolation on batch scanners like we have on regular scanners.
> Is that correct?
>
> I can see the utility of a multi-table batch scanner, but I haven't
> seen a compelling need for implementing isolation on the
> batch-scanners. Do you have a use case in mind for that?
>
> Also, it seems that your use case for isolation is not so much the
> isolated reads, but the statefulness of the iterator stack on the
> server side. Is this correct? If so, I'm even more curious about your
> use case for this, since that statefulness is only guaranteed per-row.
>
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Mon, Apr 15, 2013 at 3:10 PM, Adam Fuchs <[EMAIL PROTECTED]> wrote:
> > Thanks Bill,
> >
> > I care about latency and throughput. First available result ordering is
> > fine, though.
> >
> > Does Guava just chain through a collection of iterators, completing one
> > then moving to the next?
> >
> > Adam
> >
> >
> >
> > On Mon, Apr 15, 2013 at 3:06 PM, William Slacum <
> > [EMAIL PROTECTED]> wrote:
> >
> >> How are you expecting to get results back? Guava's Iterables could
> concat a
> >> bunch of a Scanners together, if you didn't care about the throughput
> >> aspect of it and simply wanted results from multiple tables.
> >>
> >> On Mon, Apr 15, 2013 at 3:00 PM, Adam Fuchs <[EMAIL PROTECTED]> wrote:
> >>
> >> > Is anyone else pining for a multi-table isolated batch scanner, or is
> it
> >> > just me? I like the automatic parallelism and balancing of the batch
> >> > scanner, but I'm looking to maintain server-side state in my iterators
> >> over
> >> > long-running scans. I would also like to scan over multiple tables
> >> > concurrently. Has anyone tried hacking something together with a pool
> of
> >> > non-batch scanners?
> >> >
> >> > Adam
> >> >
> >>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB