Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Pagination with HBase - getting previous page of data


+
Vijay Ganesan 2013-01-25, 04:58
+
Mohammad Tariq 2013-01-25, 05:12
+
Jean-Marc Spaggiari 2013-01-25, 12:38
+
anil gupta 2013-01-25, 17:07
+
Jean-Marc Spaggiari 2013-01-25, 17:17
+
anil gupta 2013-01-25, 17:43
+
Jean-Marc Spaggiari 2013-01-26, 02:58
+
anil gupta 2013-01-28, 03:31
+
Jean-Marc Spaggiari 2013-01-29, 21:08
+
anil gupta 2013-01-29, 21:16
+
Jean-Marc Spaggiari 2013-01-29, 21:40
+
anil gupta 2013-01-30, 07:49
+
Mohammad Tariq 2013-01-30, 03:32
Copy link to this message
-
Re: Pagination with HBase - getting previous page of data
Hi Mohammad,

You are most welcome to join the discussion. I have never used PageFilter
so i don't really have concrete input.
I had a look at
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
I could not understand that why it goes to multiple regionservers in
parallel. Why it cannot guarantee results <= page size( my guess: due to
multiple RS scans)? If you have used it then maybe you can explain the
behaviour?

Thanks,
Anil

On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> I'm kinda hesitant to put my leg in between the pros ;)But, does it sound
> sane to use PageFilter for both rows and columns and having some additional
> logic to handle the 'nth' page logic?It'll help us in both kind of paging.
>
> On Wednesday, January 30, 2013, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]>
> wrote:
> > Hi Anil,
> >
> > I think it really depend on the way you want to use the pagination.
> >
> > Do you need to be able to jump to page X? Are you ok if you miss a
> > line or 2? Is your data growing fastly? Or slowly? Is it ok if your
> > page indexes are a day old? Do you need to paginate over 300 colums?
> > Or just 1? Do you need to always have the exact same number of entries
> > in each page?
> >
> > For my usecase I need to be able to jump to the page X and I don't
> > have any content. I have hundred of millions lines. Only the rowkey
> > matter for me and I'm fine if sometime I have 50 entries displayed,
> > and sometime only 45. So I'm thinking about calculating which row is
> > the first one for each page, and store that separatly. Then I just
> > need to run the MR daily.
> >
> > It's not a perfect solution I agree, but this might do the job for me.
> > I'm totally open to all other idea which might do the job to.
> >
> > JM
> >
> > 2013/1/29, anil gupta <[EMAIL PROTECTED]>:
> >> Yes, your suggested solution only works on RowKey based pagination. It
> will
> >> fail when you start filtering on the basis of columns.
> >>
> >> Still, i would say it's comparatively easier to maintain this at
> >> Application level rather than creating tables for pagination.
> >>
> >> What if you have 300 columns in your schema. Will you create 300 tables?
> >> What about handling of pagination when filtering is done based on
> multiple
> >> columns ("and" and "or" conditions)?
> >>
> >> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
> >> [EMAIL PROTECTED]> wrote:
> >>
> >>> No, no killer solution here ;)
> >>>
> >>> But I'm still thinking about that because I might have to implement
> >>> some pagination options soon...
> >>>
> >>> As you are saying, it's only working on the row-key, but if you want
> >>> to do the same-thing on non-rowkey, you might have to create a
> >>> secondary index table...
> >>>
> >>> JM
> >>>
> >>> 2013/1/27, anil gupta <[EMAIL PROTECTED]>:
> >>> > That's alright..I thought that you have come-up with a killer
> solution.
> >>> So,
> >>> > got curious to hear your ideas. ;)
> >>> > It seems like your below mentioned solution will not work on
> filtering
> >>> > on
> >>> > non row-key columns since when you are deciding the page numbers you
> >>> > are
> >>> > only considering rowkey.
> >>> >
> >>> > Thanks,
> >>> > Anil
> >>> >
> >>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
> >>> > [EMAIL PROTECTED]> wrote:
> >>> >
> >>> >> Hi Anil,
> >>> >>
> >>> >> I don't have a solution. I never tought about that ;) But I was
> >>> >> thinking about something like you create a 2nd table where you place
> >>> >> the raw number (4 bytes) then the raw key. You go directly to a
> >>> >> specific page, you query by the number, found the key, and you know
> >>> >> where to start you scan in the main table.
> >>> >>
> >>> >> The issue is properly the number for each lines since with a MR you
> >>> >> don't know where you are from the beginning. But you can built
> >>> >> something where you store the line number from the beginning of the
>
Thanks & Regards,
Anil Gupta
+
Anoop Sam John 2013-01-30, 11:31
+
Jean-Marc Spaggiari 2013-01-30, 12:18
+
Toby Lazar 2013-01-30, 12:42
+
Asaf Mesika 2013-02-03, 14:07
+
Anoop Sam John 2013-01-31, 03:23
+
anil gupta 2013-02-02, 08:02
+
Anoop John 2013-02-03, 16:07
+
anil gupta 2013-02-03, 17:21
+
Toby Lazar 2013-02-03, 17:25
+
anil gupta 2013-02-03, 17:39