Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - Where is scanner startRow used


Copy link to this message
-
Re: Where is scanner startRow used
Varun Sharma 2013-05-15, 20:57
On Wed, May 15, 2013 at 1:20 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> Do you have some more details?
>
Yes,  the rows have 50 columns each when we use a wide schema.
Unfortunately, this was a while back when we tried to go tall and found
performance to be poor and eventually switched to wide. The reason why I
say "unfortunately" is because I don't remember the exact performance
numbers. Now we have a use case where we may have much wider rows (millions
of columns) - so because of these outliars, we prefer tall. I probably
should try reproducing the same test case again. We basically saw
significantly more iowait and I/O with the tall schema v/s get schema as we
upp'ed the load.
> Why would a scan in a tall schema be all over the place but in a wide
> schema it is not?
>
It is random in both cases - the scans are as random as the gets. Probably
a mistake in my email below.

> How wide were the rows before? About 50 columns?
>
Yes 50 columns or so (could be upto 100 but not much).

>
>
> -- Lars
>
>
> ----- Original Message -----
> From: Varun Sharma <[EMAIL PROTECTED]>
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Cc:
> Sent: Wednesday, May 15, 2013 11:58 AM
> Subject: Re: Where is scanner startRow used
>
> Yeah i just checked that we were already using startRow and its still
> significantly poorer performance than the wide schema (close to unusable)
>
> We are doing scans of 50 batch size but the scans are all over the place -
> very random because the schema is tall and not wide. I have created a JIRA
> for the same and I will report performance numbers there. But to me, not
> seeking to the start row within a region feels clearly suboptimal.
>
> Thanks
> Varun
>
>
> On Wed, May 15, 2013 at 11:48 AM, Anoop John <[EMAIL PROTECTED]>
> wrote:
>
> > At client side see ScannerCallable where this is passed to
> > ServerCallable..  Based on this only which regions should be 1st scanned
> is
> > decided..
> > I think some time back also the prefix filter was discussed. At that time
> > also the conclusion was to use the start row. U can set a start row now
> > right?  Pls check the perf with this once.
> >
> > -Anoop-
> >
> >
> > On Thu, May 16, 2013 at 12:02 AM, Varun Sharma <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi,
> > >
> > > Could someone please point me to where Scan.startRow is being used ?
> > >
> > > From what I can see in HRegion.RegionScannerImpl, it is unused. A grep
> > does
> > > not seem to return any valid entries. But my knowledge of this part is
> > > limited.
> > >
> > > We are debugging poor performance on prefix scans in tall schemas. If
> > this
> > > is really an issue, I will open a JIRA...
> > >
> > > Varun
> > >
> >
>
>