Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Pagination with HBase - getting previous page of data


Copy link to this message
-
Re: Pagination with HBase - getting previous page of data
>lets say for a scan setCaching is
10 and scan is done across two regions. 9 Results(satisfying the filter)
are in Region1 and 10 Results(satisfying the filter) are in Region2. Then
will this scan return 19 (9+10) results?

@Anil.
No it will return 10 results only not 19. The client here takes into
account the no# of results got from previous region. But a filter is
different. The filter has no logic to do at the client side. It fully
executed at server side. This is the way it is designed. Personally I would
prefer to do the pagination by app alone by using plain scan with caching
(to avoid so many RPCs) and app level logic.

-Anoop-

On Sat, Feb 2, 2013 at 1:32 PM, anil gupta <[EMAIL PROTECTED]> wrote:

> Hi Anoop,
>
> Please find my reply inline.
>
> Thanks,
> Anil
>
> On Wed, Jan 30, 2013 at 3:31 AM, Anoop Sam John <[EMAIL PROTECTED]>
> wrote:
>
> > @Anil
> >
> > >I could not understand that why it goes to multiple regionservers in
> > parallel. Why it cannot guarantee results <= page size( my guess: due to
> > multiple RS scans)? If you have used it then maybe you can explain the
> > behaviour?
> >
> > Scan from client side never go to multiple RS in parallel. Scan from
> > HTable API will be sequential with one region after the other. For every
> > region it will open up scanner in the RS and do next() calls. The filter
> > will be instantiated at server side per region level ...
> >
> > When u need 100 rows in the page and you created a Scan at client side
> > with the filter and suppose there are 2 regions, 1st the scanner is
> opened
> > at for region1 and scan is happening. It will ensure that max 100 rows
> will
> > be retrieved from that region.  But when the region boundary crosses and
> > client automatically open up scanner for the region2, there also it will
> > pass filter with max 100 rows and so from there also max 100 rows can
> > come..  So over all at the client side we can not guartee that the scan
> > created will only scan 100 rows as a whole from the table.
> >
>
> I agree with other people on this email chain that the 2nd region should
> only return (100 - no. of rows returned by Region1), if possible.
>
> When the region boundary crosses and client automatically open up scanner
> for the region2, why doesnt the scanner in Region2 knows that some of the
> rows are already fetched by region1. Do you mean to say that by default,
> for a scan spanning multiple regions, every region has it's own count of
> no.of rows that its going to return? i.e. lets say for a scan setCaching is
> 10 and scan is done across two regions. 9 Results(satisfying the filter)
> are in Region1 and 10 Results(satisfying the filter) are in Region2. Then
> will this scan return 19 (9+10) results?
>
> >
> > I think I am making it clear.   I have not PageFilter at all.. I am just
> > explaining as per the knowledge on scan flow and the general filter
> usage.
> >
> > "This is because the filter is applied separately on different region
> > servers. It does however optimize the scan of individual HRegions by
> making
> > sure that the page size is never exceeded locally. "
> >
> > I guess it need to be saying that   "This is because the filter is
> applied
> > separately on different regions".
> >
> > -Anoop-
> >
> > ________________________________________
> > From: anil gupta [[EMAIL PROTECTED]]
> > Sent: Wednesday, January 30, 2013 1:33 PM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Pagination with HBase - getting previous page of data
> >
> > Hi Mohammad,
> >
> > You are most welcome to join the discussion. I have never used PageFilter
> > so i don't really have concrete input.
> > I had a look at
> >
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
> > I could not understand that why it goes to multiple regionservers in
> > parallel. Why it cannot guarantee results <= page size( my guess: due to
> > multiple RS scans)? If you have used it then maybe you can explain the
> > behaviour?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB