Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Dealing with large data sets in client


Copy link to this message
-
Re: Dealing with large data sets in client
Bryan Beaudreault 2012-03-28, 17:47
Thanks Stack, that's correct.  It is kind of hard to describe, though I
guess it's easiest to think of it as a 2d array where the 2nd dimension is
sorted.

I think your idea would be doable, too.  I'm going to try testing them both
and see how well they perform.  Luckily I'm not TOO concerned about
performance for these outliers, as long as having multiple big scanners
like that open at once doesn't degrade performance for other queries as
well.  I'll update with my findings incase someone else ends up with a
similar use case.

On Wed, Mar 28, 2012 at 1:10 PM, Stack <[EMAIL PROTECTED]> wrote:

> On Tue, Mar 27, 2012 at 2:36 PM, Bryan Beaudreault
> <[EMAIL PROTECTED]> wrote:
> > I imagine it isn't a great idea to create a ton of scans
> > (1 for each row), which is the only way I can think to do the above with
> > what we have.
> >
>
> You want to step through some set of rows in lock-step?  That is, get
> first N on row A, then first N on row B, etc., then when that is done,
> go back and step through next N on A, B, and so on?
>
> (Pardon me if I'm being a bit thick -- its early here)
>
> I know of no way to do this other than as you suggest -- a scanner per
> row (not too bad given your rows are wide) or what about a scan to do
> first N, then a new scan to do next N... would that work?
>
> St.Ack
>