Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Pagination with HBase - getting previous page of data


Copy link to this message
-
Re: Pagination with HBase - getting previous page of data
On Sun, Feb 3, 2013 at 8:07 AM, Anoop John <[EMAIL PROTECTED]> wrote:

> >lets say for a scan setCaching is
> 10 and scan is done across two regions. 9 Results(satisfying the filter)
> are in Region1 and 10 Results(satisfying the filter) are in Region2. Then
> will this scan return 19 (9+10) results?
>
> @Anil.
> No it will return 10 results only not 19. The client here takes into
> account the no# of results got from previous region. But a filter is
> different. The filter has no logic to do at the client side. It fully
> executed at server side. This is the way it is designed. Personally I would
> prefer to do the pagination by app alone by using plain scan with caching
> (to avoid so many RPCs) and app level logic.
>
@Anoop: Nice, that's why even i try to stick simple Scans and maintain the
logic of pagination in application. :)

>
> -Anoop-
>
> On Sat, Feb 2, 2013 at 1:32 PM, anil gupta <[EMAIL PROTECTED]> wrote:
>
> > Hi Anoop,
> >
> > Please find my reply inline.
> >
> > Thanks,
> > Anil
> >
> > On Wed, Jan 30, 2013 at 3:31 AM, Anoop Sam John <[EMAIL PROTECTED]>
> > wrote:
> >
> > > @Anil
> > >
> > > >I could not understand that why it goes to multiple regionservers in
> > > parallel. Why it cannot guarantee results <= page size( my guess: due
> to
> > > multiple RS scans)? If you have used it then maybe you can explain the
> > > behaviour?
> > >
> > > Scan from client side never go to multiple RS in parallel. Scan from
> > > HTable API will be sequential with one region after the other. For
> every
> > > region it will open up scanner in the RS and do next() calls. The
> filter
> > > will be instantiated at server side per region level ...
> > >
> > > When u need 100 rows in the page and you created a Scan at client side
> > > with the filter and suppose there are 2 regions, 1st the scanner is
> > opened
> > > at for region1 and scan is happening. It will ensure that max 100 rows
> > will
> > > be retrieved from that region.  But when the region boundary crosses
> and
> > > client automatically open up scanner for the region2, there also it
> will
> > > pass filter with max 100 rows and so from there also max 100 rows can
> > > come..  So over all at the client side we can not guartee that the scan
> > > created will only scan 100 rows as a whole from the table.
> > >
> >
> > I agree with other people on this email chain that the 2nd region should
> > only return (100 - no. of rows returned by Region1), if possible.
> >
> > When the region boundary crosses and client automatically open up scanner
> > for the region2, why doesnt the scanner in Region2 knows that some of the
> > rows are already fetched by region1. Do you mean to say that by default,
> > for a scan spanning multiple regions, every region has it's own count of
> > no.of rows that its going to return? i.e. lets say for a scan setCaching
> is
> > 10 and scan is done across two regions. 9 Results(satisfying the filter)
> > are in Region1 and 10 Results(satisfying the filter) are in Region2. Then
> > will this scan return 19 (9+10) results?
> >
> > >
> > > I think I am making it clear.   I have not PageFilter at all.. I am
> just
> > > explaining as per the knowledge on scan flow and the general filter
> > usage.
> > >
> > > "This is because the filter is applied separately on different region
> > > servers. It does however optimize the scan of individual HRegions by
> > making
> > > sure that the page size is never exceeded locally. "
> > >
> > > I guess it need to be saying that   "This is because the filter is
> > applied
> > > separately on different regions".
> > >
> > > -Anoop-
> > >
> > > ________________________________________
> > > From: anil gupta [[EMAIL PROTECTED]]
> > > Sent: Wednesday, January 30, 2013 1:33 PM
> > > To: [EMAIL PROTECTED]
> > > Subject: Re: Pagination with HBase - getting previous page of data
> > >
> > > Hi Mohammad,
> > >
> > > You are most welcome to join the discussion. I have never used
> PageFilter
> > > so i don't really have concrete input.

Thanks & Regards,
Anil Gupta