Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Pagination with HBase - getting previous page of data


+
Vijay Ganesan 2013-01-25, 04:58
+
Mohammad Tariq 2013-01-25, 05:12
+
Jean-Marc Spaggiari 2013-01-25, 12:38
+
anil gupta 2013-01-25, 17:07
+
Jean-Marc Spaggiari 2013-01-25, 17:17
+
anil gupta 2013-01-25, 17:43
+
Jean-Marc Spaggiari 2013-01-26, 02:58
+
anil gupta 2013-01-28, 03:31
+
Jean-Marc Spaggiari 2013-01-29, 21:08
+
anil gupta 2013-01-29, 21:16
+
Jean-Marc Spaggiari 2013-01-29, 21:40
+
anil gupta 2013-01-30, 07:49
+
Mohammad Tariq 2013-01-30, 03:32
+
anil gupta 2013-01-30, 08:03
+
Anoop Sam John 2013-01-30, 11:31
+
Jean-Marc Spaggiari 2013-01-30, 12:18
+
Toby Lazar 2013-01-30, 12:42
+
Asaf Mesika 2013-02-03, 14:07
+
Anoop Sam John 2013-01-31, 03:23
+
anil gupta 2013-02-02, 08:02
+
Anoop John 2013-02-03, 16:07
Copy link to this message
-
Re: Pagination with HBase - getting previous page of data
On Sun, Feb 3, 2013 at 8:07 AM, Anoop John <[EMAIL PROTECTED]> wrote:

> >lets say for a scan setCaching is
> 10 and scan is done across two regions. 9 Results(satisfying the filter)
> are in Region1 and 10 Results(satisfying the filter) are in Region2. Then
> will this scan return 19 (9+10) results?
>
> @Anil.
> No it will return 10 results only not 19. The client here takes into
> account the no# of results got from previous region. But a filter is
> different. The filter has no logic to do at the client side. It fully
> executed at server side. This is the way it is designed. Personally I would
> prefer to do the pagination by app alone by using plain scan with caching
> (to avoid so many RPCs) and app level logic.
>
@Anoop: Nice, that's why even i try to stick simple Scans and maintain the
logic of pagination in application. :)

>
> -Anoop-
>
> On Sat, Feb 2, 2013 at 1:32 PM, anil gupta <[EMAIL PROTECTED]> wrote:
>
> > Hi Anoop,
> >
> > Please find my reply inline.
> >
> > Thanks,
> > Anil
> >
> > On Wed, Jan 30, 2013 at 3:31 AM, Anoop Sam John <[EMAIL PROTECTED]>
> > wrote:
> >
> > > @Anil
> > >
> > > >I could not understand that why it goes to multiple regionservers in
> > > parallel. Why it cannot guarantee results <= page size( my guess: due
> to
> > > multiple RS scans)? If you have used it then maybe you can explain the
> > > behaviour?
> > >
> > > Scan from client side never go to multiple RS in parallel. Scan from
> > > HTable API will be sequential with one region after the other. For
> every
> > > region it will open up scanner in the RS and do next() calls. The
> filter
> > > will be instantiated at server side per region level ...
> > >
> > > When u need 100 rows in the page and you created a Scan at client side
> > > with the filter and suppose there are 2 regions, 1st the scanner is
> > opened
> > > at for region1 and scan is happening. It will ensure that max 100 rows
> > will
> > > be retrieved from that region.  But when the region boundary crosses
> and
> > > client automatically open up scanner for the region2, there also it
> will
> > > pass filter with max 100 rows and so from there also max 100 rows can
> > > come..  So over all at the client side we can not guartee that the scan
> > > created will only scan 100 rows as a whole from the table.
> > >
> >
> > I agree with other people on this email chain that the 2nd region should
> > only return (100 - no. of rows returned by Region1), if possible.
> >
> > When the region boundary crosses and client automatically open up scanner
> > for the region2, why doesnt the scanner in Region2 knows that some of the
> > rows are already fetched by region1. Do you mean to say that by default,
> > for a scan spanning multiple regions, every region has it's own count of
> > no.of rows that its going to return? i.e. lets say for a scan setCaching
> is
> > 10 and scan is done across two regions. 9 Results(satisfying the filter)
> > are in Region1 and 10 Results(satisfying the filter) are in Region2. Then
> > will this scan return 19 (9+10) results?
> >
> > >
> > > I think I am making it clear.   I have not PageFilter at all.. I am
> just
> > > explaining as per the knowledge on scan flow and the general filter
> > usage.
> > >
> > > "This is because the filter is applied separately on different region
> > > servers. It does however optimize the scan of individual HRegions by
> > making
> > > sure that the page size is never exceeded locally. "
> > >
> > > I guess it need to be saying that   "This is because the filter is
> > applied
> > > separately on different regions".
> > >
> > > -Anoop-
> > >
> > > ________________________________________
> > > From: anil gupta [[EMAIL PROTECTED]]
> > > Sent: Wednesday, January 30, 2013 1:33 PM
> > > To: [EMAIL PROTECTED]
> > > Subject: Re: Pagination with HBase - getting previous page of data
> > >
> > > Hi Mohammad,
> > >
> > > You are most welcome to join the discussion. I have never used
> PageFilter
> > > so i don't really have concrete input.

Thanks & Regards,
Anil Gupta
+
Toby Lazar 2013-02-03, 17:25
+
anil gupta 2013-02-03, 17:39
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB