Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Where is scanner startRow used


Copy link to this message
-
Re: Where is scanner startRow used
Do you have some more details?
Why would a scan in a tall schema be all over the place but in a wide schema it is not?
How wide were the rows before? About 50 columns?
-- Lars
----- Original Message -----
From: Varun Sharma <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Cc:
Sent: Wednesday, May 15, 2013 11:58 AM
Subject: Re: Where is scanner startRow used

Yeah i just checked that we were already using startRow and its still
significantly poorer performance than the wide schema (close to unusable)

We are doing scans of 50 batch size but the scans are all over the place -
very random because the schema is tall and not wide. I have created a JIRA
for the same and I will report performance numbers there. But to me, not
seeking to the start row within a region feels clearly suboptimal.

Thanks
Varun
On Wed, May 15, 2013 at 11:48 AM, Anoop John <[EMAIL PROTECTED]> wrote:

> At client side see ScannerCallable where this is passed to
> ServerCallable..  Based on this only which regions should be 1st scanned is
> decided..
> I think some time back also the prefix filter was discussed. At that time
> also the conclusion was to use the start row. U can set a start row now
> right?  Pls check the perf with this once.
>
> -Anoop-
>
>
> On Thu, May 16, 2013 at 12:02 AM, Varun Sharma <[EMAIL PROTECTED]>
> wrote:
>
> > Hi,
> >
> > Could someone please point me to where Scan.startRow is being used ?
> >
> > From what I can see in HRegion.RegionScannerImpl, it is unused. A grep
> does
> > not seem to return any valid entries. But my knowledge of this part is
> > limited.
> >
> > We are debugging poor performance on prefix scans in tall schemas. If
> this
> > is really an issue, I will open a JIRA...
> >
> > Varun
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB