Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Pagination with HBase - getting previous page of data


+
Vijay Ganesan 2013-01-25, 04:58
+
Mohammad Tariq 2013-01-25, 05:12
+
Jean-Marc Spaggiari 2013-01-25, 12:38
+
anil gupta 2013-01-25, 17:07
+
Jean-Marc Spaggiari 2013-01-25, 17:17
+
anil gupta 2013-01-25, 17:43
+
Jean-Marc Spaggiari 2013-01-26, 02:58
+
anil gupta 2013-01-28, 03:31
+
Jean-Marc Spaggiari 2013-01-29, 21:08
+
anil gupta 2013-01-29, 21:16
+
Jean-Marc Spaggiari 2013-01-29, 21:40
+
anil gupta 2013-01-30, 07:49
+
Mohammad Tariq 2013-01-30, 03:32
+
anil gupta 2013-01-30, 08:03
+
Anoop Sam John 2013-01-30, 11:31
Copy link to this message
-
Re: Pagination with HBase - getting previous page of data
Jean-Marc Spaggiari 2013-01-30, 12:18
Hi Anoop,

So does it mean the scanner can send back LIMIT*2-1 lines max? Reading
100 rows from the 2nd region is using extra time and resources. Why
not ask for only the number of missing lines?

JM

2013/1/30, Anoop Sam John <[EMAIL PROTECTED]>:
> @Anil
>
>>I could not understand that why it goes to multiple regionservers in
> parallel. Why it cannot guarantee results <= page size( my guess: due to
> multiple RS scans)? If you have used it then maybe you can explain the
> behaviour?
>
> Scan from client side never go to multiple RS in parallel. Scan from HTable
> API will be sequential with one region after the other. For every region it
> will open up scanner in the RS and do next() calls. The filter will be
> instantiated at server side per region level ...
>
> When u need 100 rows in the page and you created a Scan at client side with
> the filter and suppose there are 2 regions, 1st the scanner is opened at for
> region1 and scan is happening. It will ensure that max 100 rows will be
> retrieved from that region.  But when the region boundary crosses and client
> automatically open up scanner for the region2, there also it will pass
> filter with max 100 rows and so from there also max 100 rows can come..  So
> over all at the client side we can not guartee that the scan created will
> only scan 100 rows as a whole from the table.
>
> I think I am making it clear.   I have not PageFilter at all.. I am just
> explaining as per the knowledge on scan flow and the general filter usage.
>
> "This is because the filter is applied separately on different region
> servers. It does however optimize the scan of individual HRegions by making
> sure that the page size is never exceeded locally. "
>
> I guess it need to be saying that   "This is because the filter is applied
> separately on different regions".
>
> -Anoop-
>
> ________________________________________
> From: anil gupta [[EMAIL PROTECTED]]
> Sent: Wednesday, January 30, 2013 1:33 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Pagination with HBase - getting previous page of data
>
> Hi Mohammad,
>
> You are most welcome to join the discussion. I have never used PageFilter
> so i don't really have concrete input.
> I had a look at
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
> I could not understand that why it goes to multiple regionservers in
> parallel. Why it cannot guarantee results <= page size( my guess: due to
> multiple RS scans)? If you have used it then maybe you can explain the
> behaviour?
>
> Thanks,
> Anil
>
> On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>
>> I'm kinda hesitant to put my leg in between the pros ;)But, does it sound
>> sane to use PageFilter for both rows and columns and having some
>> additional
>> logic to handle the 'nth' page logic?It'll help us in both kind of
>> paging.
>>
>> On Wednesday, January 30, 2013, Jean-Marc Spaggiari <
>> [EMAIL PROTECTED]>
>> wrote:
>> > Hi Anil,
>> >
>> > I think it really depend on the way you want to use the pagination.
>> >
>> > Do you need to be able to jump to page X? Are you ok if you miss a
>> > line or 2? Is your data growing fastly? Or slowly? Is it ok if your
>> > page indexes are a day old? Do you need to paginate over 300 colums?
>> > Or just 1? Do you need to always have the exact same number of entries
>> > in each page?
>> >
>> > For my usecase I need to be able to jump to the page X and I don't
>> > have any content. I have hundred of millions lines. Only the rowkey
>> > matter for me and I'm fine if sometime I have 50 entries displayed,
>> > and sometime only 45. So I'm thinking about calculating which row is
>> > the first one for each page, and store that separatly. Then I just
>> > need to run the MR daily.
>> >
>> > It's not a perfect solution I agree, but this might do the job for me.
>> > I'm totally open to all other idea which might do the job to.
>> >
>> > JM
>> >
>> > 2013/1/29, anil gupta <[EMAIL PROTECTED]>:
+
Toby Lazar 2013-01-30, 12:42
+
Asaf Mesika 2013-02-03, 14:07
+
Anoop Sam John 2013-01-31, 03:23
+
anil gupta 2013-02-02, 08:02
+
Anoop John 2013-02-03, 16:07
+
anil gupta 2013-02-03, 17:21
+
Toby Lazar 2013-02-03, 17:25
+
anil gupta 2013-02-03, 17:39