Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> No of rows


+
Mohit Anchlia 2012-09-12, 22:50
+
Doug Meil 2012-09-12, 22:59
+
Mohit Anchlia 2012-09-12, 23:29
+
lars hofhansl 2012-09-12, 23:48
+
Mohit Anchlia 2012-09-12, 23:51
Copy link to this message
-
Re: No of rows
If we set caching to N, the region server will attempt to scan N rows before the next() returns.
So if you typically early out of a scan at the client the server will scan on average N/2 rows too many, which you have to trade off again the number of RPCs request without caching.
Good numbers for caching are typically between 10 and 100 (depending on the number of columns per row).
-- Lars

________________________________
 From: Mohit Anchlia <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
Sent: Wednesday, September 12, 2012 4:51 PM
Subject: Re: No of rows
 
On Wed, Sep 12, 2012 at 4:48 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:

> No. By default each call to ClientScanner.next(...) incurs an RPC call to
> the HBase server, which is why it is important to enable scanner caching
> (as opposed to batching) if you expect to scan many rows.
> By default scanner caching is set to 1.
>
Thanks! If caching is set > 1 then is there a way to limit no of rows
that's fetched from the server?

>
>
> ________________________________
>  From: Mohit Anchlia <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Wednesday, September 12, 2012 4:29 PM
> Subject: Re: No of rows
>
> But when resultscanner executes wouldn't it already query the servers for
> all the rows matching the startkey? I am tyring to avoid reading all the
> blocks from the file system that matches the keys.
>
> On Wed, Sep 12, 2012 at 3:59 PM, Doug Meil <[EMAIL PROTECTED]
> >wrote:
>
> >
> > Hi there,
> >
> > If you're talking about stopping a scan after X rows (as opposed to the
> > batching), but break out of the ResultScanner loop after X rows.
> >
> > http://hbase.apache.org/book.html#data_model_operations
> >
> > You can either add a ColumnFamily to a scan, or add specific attributes
> > (I.e., "cf:column") to a scan.
> >
> >
> >
> >
> > On 9/12/12 6:50 PM, "Mohit Anchlia" <[EMAIL PROTECTED]> wrote:
> >
> > >I am using client 0.90.5 jar
> > >
> > >Is there a way to limit how many rows can be fetched in one scan call?
> > >
> > >Similarly is there something for colums?
> >
> >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB