Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Where is scanner startRow used


Copy link to this message
-
Re: Where is scanner startRow used
If this is a regression (but i don't think it to be) may be if you observe
this behaviour in any recent versions or it was like that in all the
version that you had used earlier that made you switch to wide schemas.

Regards
Ram
On Thu, May 16, 2013 at 2:27 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> On Wed, May 15, 2013 at 1:20 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > Do you have some more details?
> >
> Yes,  the rows have 50 columns each when we use a wide schema.
> Unfortunately, this was a while back when we tried to go tall and found
> performance to be poor and eventually switched to wide. The reason why I
> say "unfortunately" is because I don't remember the exact performance
> numbers. Now we have a use case where we may have much wider rows (millions
> of columns) - so because of these outliars, we prefer tall. I probably
> should try reproducing the same test case again. We basically saw
> significantly more iowait and I/O with the tall schema v/s get schema as we
> upp'ed the load.
>
>
> > Why would a scan in a tall schema be all over the place but in a wide
> > schema it is not?
> >
> It is random in both cases - the scans are as random as the gets. Probably
> a mistake in my email below.
>
> > How wide were the rows before? About 50 columns?
> >
> Yes 50 columns or so (could be upto 100 but not much).
>
> >
> >
> > -- Lars
> >
> >
> > ----- Original Message -----
> > From: Varun Sharma <[EMAIL PROTECTED]>
> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> > Cc:
> > Sent: Wednesday, May 15, 2013 11:58 AM
> > Subject: Re: Where is scanner startRow used
> >
> > Yeah i just checked that we were already using startRow and its still
> > significantly poorer performance than the wide schema (close to unusable)
> >
> > We are doing scans of 50 batch size but the scans are all over the place
> -
> > very random because the schema is tall and not wide. I have created a
> JIRA
> > for the same and I will report performance numbers there. But to me, not
> > seeking to the start row within a region feels clearly suboptimal.
> >
> > Thanks
> > Varun
> >
> >
> > On Wed, May 15, 2013 at 11:48 AM, Anoop John <[EMAIL PROTECTED]>
> > wrote:
> >
> > > At client side see ScannerCallable where this is passed to
> > > ServerCallable..  Based on this only which regions should be 1st
> scanned
> > is
> > > decided..
> > > I think some time back also the prefix filter was discussed. At that
> time
> > > also the conclusion was to use the start row. U can set a start row now
> > > right?  Pls check the perf with this once.
> > >
> > > -Anoop-
> > >
> > >
> > > On Thu, May 16, 2013 at 12:02 AM, Varun Sharma <[EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Could someone please point me to where Scan.startRow is being used ?
> > > >
> > > > From what I can see in HRegion.RegionScannerImpl, it is unused. A
> grep
> > > does
> > > > not seem to return any valid entries. But my knowledge of this part
> is
> > > > limited.
> > > >
> > > > We are debugging poor performance on prefix scans in tall schemas. If
> > > this
> > > > is really an issue, I will open a JIRA...
> > > >
> > > > Varun
> > > >
> > >
> >
> >
>