Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Pagination with HBase - getting previous page of data


+
Vijay Ganesan 2013-01-25, 04:58
+
Mohammad Tariq 2013-01-25, 05:12
+
Jean-Marc Spaggiari 2013-01-25, 12:38
+
anil gupta 2013-01-25, 17:07
+
Jean-Marc Spaggiari 2013-01-25, 17:17
+
anil gupta 2013-01-25, 17:43
+
Jean-Marc Spaggiari 2013-01-26, 02:58
+
anil gupta 2013-01-28, 03:31
+
Jean-Marc Spaggiari 2013-01-29, 21:08
+
anil gupta 2013-01-29, 21:16
+
Jean-Marc Spaggiari 2013-01-29, 21:40
+
anil gupta 2013-01-30, 07:49
+
Mohammad Tariq 2013-01-30, 03:32
+
anil gupta 2013-01-30, 08:03
+
Anoop Sam John 2013-01-30, 11:31
+
Jean-Marc Spaggiari 2013-01-30, 12:18
+
Toby Lazar 2013-01-30, 12:42
+
Asaf Mesika 2013-02-03, 14:07
+
Anoop Sam John 2013-01-31, 03:23
Copy link to this message
-
Re: Pagination with HBase - getting previous page of data
Hi Anoop,

Please find my reply inline.

Thanks,
Anil

On Wed, Jan 30, 2013 at 3:31 AM, Anoop Sam John <[EMAIL PROTECTED]> wrote:

> @Anil
>
> >I could not understand that why it goes to multiple regionservers in
> parallel. Why it cannot guarantee results <= page size( my guess: due to
> multiple RS scans)? If you have used it then maybe you can explain the
> behaviour?
>
> Scan from client side never go to multiple RS in parallel. Scan from
> HTable API will be sequential with one region after the other. For every
> region it will open up scanner in the RS and do next() calls. The filter
> will be instantiated at server side per region level ...
>
> When u need 100 rows in the page and you created a Scan at client side
> with the filter and suppose there are 2 regions, 1st the scanner is opened
> at for region1 and scan is happening. It will ensure that max 100 rows will
> be retrieved from that region.  But when the region boundary crosses and
> client automatically open up scanner for the region2, there also it will
> pass filter with max 100 rows and so from there also max 100 rows can
> come..  So over all at the client side we can not guartee that the scan
> created will only scan 100 rows as a whole from the table.
>

I agree with other people on this email chain that the 2nd region should
only return (100 - no. of rows returned by Region1), if possible.

When the region boundary crosses and client automatically open up scanner
for the region2, why doesnt the scanner in Region2 knows that some of the
rows are already fetched by region1. Do you mean to say that by default,
for a scan spanning multiple regions, every region has it's own count of
no.of rows that its going to return? i.e. lets say for a scan setCaching is
10 and scan is done across two regions. 9 Results(satisfying the filter)
are in Region1 and 10 Results(satisfying the filter) are in Region2. Then
will this scan return 19 (9+10) results?

>
> I think I am making it clear.   I have not PageFilter at all.. I am just
> explaining as per the knowledge on scan flow and the general filter usage.
>
> "This is because the filter is applied separately on different region
> servers. It does however optimize the scan of individual HRegions by making
> sure that the page size is never exceeded locally. "
>
> I guess it need to be saying that   "This is because the filter is applied
> separately on different regions".
>
> -Anoop-
>
> ________________________________________
> From: anil gupta [[EMAIL PROTECTED]]
> Sent: Wednesday, January 30, 2013 1:33 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Pagination with HBase - getting previous page of data
>
> Hi Mohammad,
>
> You are most welcome to join the discussion. I have never used PageFilter
> so i don't really have concrete input.
> I had a look at
>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
> I could not understand that why it goes to multiple regionservers in
> parallel. Why it cannot guarantee results <= page size( my guess: due to
> multiple RS scans)? If you have used it then maybe you can explain the
> behaviour?
>
> Thanks,
> Anil
>
> On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <[EMAIL PROTECTED]>
> wrote:
>
> > I'm kinda hesitant to put my leg in between the pros ;)But, does it sound
> > sane to use PageFilter for both rows and columns and having some
> additional
> > logic to handle the 'nth' page logic?It'll help us in both kind of
> paging.
> >
> > On Wednesday, January 30, 2013, Jean-Marc Spaggiari <
> > [EMAIL PROTECTED]>
> > wrote:
> > > Hi Anil,
> > >
> > > I think it really depend on the way you want to use the pagination.
> > >
> > > Do you need to be able to jump to page X? Are you ok if you miss a
> > > line or 2? Is your data growing fastly? Or slowly? Is it ok if your
> > > page indexes are a day old? Do you need to paginate over 300 colums?
> > > Or just 1? Do you need to always have the exact same number of entries
> > > in each page?
>
Thanks & Regards,
Anil Gupta
+
Anoop John 2013-02-03, 16:07
+
anil gupta 2013-02-03, 17:21
+
Toby Lazar 2013-02-03, 17:25
+
anil gupta 2013-02-03, 17:39
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB