Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # dev - multi-table isolated batch scanner


+
Adam Fuchs 2013-04-15, 19:00
+
William Slacum 2013-04-15, 19:06
+
Adam Fuchs 2013-04-15, 19:10
+
Christopher 2013-04-15, 20:13
Copy link to this message
-
Re: multi-table isolated batch scanner
Adam Fuchs 2013-04-15, 21:06
Chris,

The desire for isolation stems from the desire to amortize some computation
over a number of results. Say it takes 5 seconds to compute an intersection
of a couple of sets within the iterators, and then streaming back the
results takes a minute or so. If I have to redo the 5 second computation
many times, as in to support the reconstruction of the iterator tree, then
that computation may start to dominate my query performance. Primarily,
this means I need to be able to continue a scan without having to rebuild
the iterators. Isolation in the scanner has that side effect. Proper
isolation would be a "nice-to-have", but I can deal with not having it.

Adam

On Mon, Apr 15, 2013 at 4:13 PM, Christopher <[EMAIL PROTECTED]> wrote:

> Adam-
>
> It seems like you're talking about two features at once:
> 1) Multi-table batch scanner.
> 2) Scan Isolation on batch scanners like we have on regular scanners.
> Is that correct?
>
> I can see the utility of a multi-table batch scanner, but I haven't
> seen a compelling need for implementing isolation on the
> batch-scanners. Do you have a use case in mind for that?
>
> Also, it seems that your use case for isolation is not so much the
> isolated reads, but the statefulness of the iterator stack on the
> server side. Is this correct? If so, I'm even more curious about your
> use case for this, since that statefulness is only guaranteed per-row.
>
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Mon, Apr 15, 2013 at 3:10 PM, Adam Fuchs <[EMAIL PROTECTED]> wrote:
> > Thanks Bill,
> >
> > I care about latency and throughput. First available result ordering is
> > fine, though.
> >
> > Does Guava just chain through a collection of iterators, completing one
> > then moving to the next?
> >
> > Adam
> >
> >
> >
> > On Mon, Apr 15, 2013 at 3:06 PM, William Slacum <
> > [EMAIL PROTECTED]> wrote:
> >
> >> How are you expecting to get results back? Guava's Iterables could
> concat a
> >> bunch of a Scanners together, if you didn't care about the throughput
> >> aspect of it and simply wanted results from multiple tables.
> >>
> >> On Mon, Apr 15, 2013 at 3:00 PM, Adam Fuchs <[EMAIL PROTECTED]> wrote:
> >>
> >> > Is anyone else pining for a multi-table isolated batch scanner, or is
> it
> >> > just me? I like the automatic parallelism and balancing of the batch
> >> > scanner, but I'm looking to maintain server-side state in my iterators
> >> over
> >> > long-running scans. I would also like to scan over multiple tables
> >> > concurrently. Has anyone tried hacking something together with a pool
> of
> >> > non-batch scanners?
> >> >
> >> > Adam
> >> >
> >>
>
+
Keith Turner 2013-04-15, 21:33
+
Adam Fuchs 2013-04-15, 22:19
+
Dave Marion 2013-04-15, 22:37
+
Adam Fuchs 2013-04-15, 23:01
+
Keith Turner 2013-04-16, 13:29
+
Adam Fuchs 2013-04-16, 16:33
+
Keith Turner 2013-04-15, 20:48