Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Pagination with HBase - getting previous page of data


+
Vijay Ganesan 2013-01-25, 04:58
+
Mohammad Tariq 2013-01-25, 05:12
+
Jean-Marc Spaggiari 2013-01-25, 12:38
+
anil gupta 2013-01-25, 17:07
+
Jean-Marc Spaggiari 2013-01-25, 17:17
+
anil gupta 2013-01-25, 17:43
+
Jean-Marc Spaggiari 2013-01-26, 02:58
+
anil gupta 2013-01-28, 03:31
+
Jean-Marc Spaggiari 2013-01-29, 21:08
+
anil gupta 2013-01-29, 21:16
+
Jean-Marc Spaggiari 2013-01-29, 21:40
Copy link to this message
-
Re: Pagination with HBase - getting previous page of data
anil gupta 2013-01-30, 07:49
Hi Jean,

Please find my reply inline.

On Tue, Jan 29, 2013 at 1:40 PM, Jean-Marc Spaggiari <
[EMAIL PROTECTED]> wrote:

> Hi Anil,
>
> I think it really depend on the way you want to use the pagination.
>
Absolutely true!

>
> Do you need to be able to jump to page X? Are you ok if you miss a
> line or 2? Is your data growing fastly? Or slowly? Is it ok if your
> page indexes are a day old? Do you need to paginate over 300 colums?
> Or just 1? Do you need to always have the exact same number of entries
> in each page?
>
No, i dont need to be able to jump page X.
I dont think that missing lines will be acceptable. I need to filter the
rows on non-rowkey attributes. It wont be ok if my page indexes are 1 day
old. I need to paginate on basis of various filters based on columns
or(and) rowkey. So, the number of combinations are quite large.

>
> For my usecase I need to be able to jump to the page X and I don't
> have any content. I have hundred of millions lines. Only the rowkey
> matter for me and I'm fine if sometime I have 50 entries displayed,
> and sometime only 45. So I'm thinking about calculating which row is
> the first one for each page, and store that separatly. Then I just
> need to run the MR daily.
>
hmm..yeah, it might work for you.

>
> It's not a perfect solution I agree, but this might do the job for me.
> I'm totally open to all other idea which might do the job to.
>
There is nothing like a "perfect" solution. If the implementation is able
to fulfill your business needs, then go for it.

>
> JM
>
> 2013/1/29, anil gupta <[EMAIL PROTECTED]>:
> > Yes, your suggested solution only works on RowKey based pagination. It
> will
> > fail when you start filtering on the basis of columns.
> >
> > Still, i would say it's comparatively easier to maintain this at
> > Application level rather than creating tables for pagination.
> >
> > What if you have 300 columns in your schema. Will you create 300 tables?
> > What about handling of pagination when filtering is done based on
> multiple
> > columns ("and" and "or" conditions)?
> >
> > On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
> > [EMAIL PROTECTED]> wrote:
> >
> >> No, no killer solution here ;)
> >>
> >> But I'm still thinking about that because I might have to implement
> >> some pagination options soon...
> >>
> >> As you are saying, it's only working on the row-key, but if you want
> >> to do the same-thing on non-rowkey, you might have to create a
> >> secondary index table...
> >>
> >> JM
> >>
> >> 2013/1/27, anil gupta <[EMAIL PROTECTED]>:
> >> > That's alright..I thought that you have come-up with a killer
> solution.
> >> So,
> >> > got curious to hear your ideas. ;)
> >> > It seems like your below mentioned solution will not work on filtering
> >> > on
> >> > non row-key columns since when you are deciding the page numbers you
> >> > are
> >> > only considering rowkey.
> >> >
> >> > Thanks,
> >> > Anil
> >> >
> >> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
> >> > [EMAIL PROTECTED]> wrote:
> >> >
> >> >> Hi Anil,
> >> >>
> >> >> I don't have a solution. I never tought about that ;) But I was
> >> >> thinking about something like you create a 2nd table where you place
> >> >> the raw number (4 bytes) then the raw key. You go directly to a
> >> >> specific page, you query by the number, found the key, and you know
> >> >> where to start you scan in the main table.
> >> >>
> >> >> The issue is properly the number for each lines since with a MR you
> >> >> don't know where you are from the beginning. But you can built
> >> >> something where you store the line number from the beginning of the
> >> >> region, then when all regions are parsed you can reconstruct the
> total
> >> >> numbering... That should work...
> >> >>
> >> >> JM
> >> >>
> >> >> 2013/1/25, anil gupta <[EMAIL PROTECTED]>:
> >> >> > Inline...
> >> >> >
> >> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
> >> >> > [EMAIL PROTECTED]> wrote:

Thanks & Regards,
Anil Gupta
+
Mohammad Tariq 2013-01-30, 03:32
+
anil gupta 2013-01-30, 08:03
+
Anoop Sam John 2013-01-30, 11:31
+
Jean-Marc Spaggiari 2013-01-30, 12:18
+
Toby Lazar 2013-01-30, 12:42
+
Asaf Mesika 2013-02-03, 14:07
+
Anoop Sam John 2013-01-31, 03:23
+
anil gupta 2013-02-02, 08:02
+
Anoop John 2013-02-03, 16:07
+
anil gupta 2013-02-03, 17:21
+
Toby Lazar 2013-02-03, 17:25
+
anil gupta 2013-02-03, 17:39