Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Pagination with HBase - getting previous page of data


Copy link to this message
-
Re: Pagination with HBase - getting previous page of data
Here are my thoughts on this matter:

1. If you define setCaching(numOfRows) on the the scan object, you can
check before each call to make sure you haven't passed your page limit,
thus won't get to the point in which you retrieve from each region pageSize
results.

2. I think its o.k. for the UI to present a certain point in time in the
database on offer paging on that. You can achieve that by taking current
timestamp (System.currentTime()) and force the results to returned up to
that time by using scan.setTimeRange(0, currentTime). If you save
currentTime and send it back with the results to the UI, it can keep
sending it to backend, thus ensuring you're viewing that point in time.
If rows keeps being inserted, their timestamp will be greater, thus not
displayed
On Wed, Jan 30, 2013 at 2:42 PM, Toby Lazar <[EMAIL PROTECTED]> wrote:

> Sounds like if you had 1000 regions, each with 99 rows, and you asked
> for 100 that you'd get back 99,000. My guess is that a Filter is
> serialized once and that is sent successively to each region and that
> it isn't updated between regions.  Don't think doing that would be too
> easy.
>
> Toby
>
> On 1/30/13, Jean-Marc Spaggiari <[EMAIL PROTECTED]> wrote:
> > Hi Anoop,
> >
> > So does it mean the scanner can send back LIMIT*2-1 lines max? Reading
> > 100 rows from the 2nd region is using extra time and resources. Why
> > not ask for only the number of missing lines?
> >
> > JM
> >
> > 2013/1/30, Anoop Sam John <[EMAIL PROTECTED]>:
> >> @Anil
> >>
> >>>I could not understand that why it goes to multiple regionservers in
> >> parallel. Why it cannot guarantee results <= page size( my guess: due to
> >> multiple RS scans)? If you have used it then maybe you can explain the
> >> behaviour?
> >>
> >> Scan from client side never go to multiple RS in parallel. Scan from
> >> HTable
> >> API will be sequential with one region after the other. For every region
> >> it
> >> will open up scanner in the RS and do next() calls. The filter will be
> >> instantiated at server side per region level ...
> >>
> >> When u need 100 rows in the page and you created a Scan at client side
> >> with
> >> the filter and suppose there are 2 regions, 1st the scanner is opened at
> >> for
> >> region1 and scan is happening. It will ensure that max 100 rows will be
> >> retrieved from that region.  But when the region boundary crosses and
> >> client
> >> automatically open up scanner for the region2, there also it will pass
> >> filter with max 100 rows and so from there also max 100 rows can come..
> >> So
> >> over all at the client side we can not guartee that the scan created
> will
> >> only scan 100 rows as a whole from the table.
> >>
> >> I think I am making it clear.   I have not PageFilter at all.. I am just
> >> explaining as per the knowledge on scan flow and the general filter
> >> usage.
> >>
> >> "This is because the filter is applied separately on different region
> >> servers. It does however optimize the scan of individual HRegions by
> >> making
> >> sure that the page size is never exceeded locally. "
> >>
> >> I guess it need to be saying that   "This is because the filter is
> >> applied
> >> separately on different regions".
> >>
> >> -Anoop-
> >>
> >> ________________________________________
> >> From: anil gupta [[EMAIL PROTECTED]]
> >> Sent: Wednesday, January 30, 2013 1:33 PM
> >> To: [EMAIL PROTECTED]
> >> Subject: Re: Pagination with HBase - getting previous page of data
> >>
> >> Hi Mohammad,
> >>
> >> You are most welcome to join the discussion. I have never used
> PageFilter
> >> so i don't really have concrete input.
> >> I had a look at
> >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
> >> I could not understand that why it goes to multiple regionservers in
> >> parallel. Why it cannot guarantee results <= page size( my guess: due to
> >> multiple RS scans)? If you have used it then maybe you can explain the
> >> behaviour?
> >>
> >