Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Pagination with HBase - getting previous page of data


+
Vijay Ganesan 2013-01-25, 04:58
+
Mohammad Tariq 2013-01-25, 05:12
+
Jean-Marc Spaggiari 2013-01-25, 12:38
+
anil gupta 2013-01-25, 17:07
+
Jean-Marc Spaggiari 2013-01-25, 17:17
+
anil gupta 2013-01-25, 17:43
+
Jean-Marc Spaggiari 2013-01-26, 02:58
+
anil gupta 2013-01-28, 03:31
+
Jean-Marc Spaggiari 2013-01-29, 21:08
+
anil gupta 2013-01-29, 21:16
Copy link to this message
-
Re: Pagination with HBase - getting previous page of data
Hi Anil,

I think it really depend on the way you want to use the pagination.

Do you need to be able to jump to page X? Are you ok if you miss a
line or 2? Is your data growing fastly? Or slowly? Is it ok if your
page indexes are a day old? Do you need to paginate over 300 colums?
Or just 1? Do you need to always have the exact same number of entries
in each page?

For my usecase I need to be able to jump to the page X and I don't
have any content. I have hundred of millions lines. Only the rowkey
matter for me and I'm fine if sometime I have 50 entries displayed,
and sometime only 45. So I'm thinking about calculating which row is
the first one for each page, and store that separatly. Then I just
need to run the MR daily.

It's not a perfect solution I agree, but this might do the job for me.
I'm totally open to all other idea which might do the job to.

JM

2013/1/29, anil gupta <[EMAIL PROTECTED]>:
> Yes, your suggested solution only works on RowKey based pagination. It will
> fail when you start filtering on the basis of columns.
>
> Still, i would say it's comparatively easier to maintain this at
> Application level rather than creating tables for pagination.
>
> What if you have 300 columns in your schema. Will you create 300 tables?
> What about handling of pagination when filtering is done based on multiple
> columns ("and" and "or" conditions)?
>
> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
>> No, no killer solution here ;)
>>
>> But I'm still thinking about that because I might have to implement
>> some pagination options soon...
>>
>> As you are saying, it's only working on the row-key, but if you want
>> to do the same-thing on non-rowkey, you might have to create a
>> secondary index table...
>>
>> JM
>>
>> 2013/1/27, anil gupta <[EMAIL PROTECTED]>:
>> > That's alright..I thought that you have come-up with a killer solution.
>> So,
>> > got curious to hear your ideas. ;)
>> > It seems like your below mentioned solution will not work on filtering
>> > on
>> > non row-key columns since when you are deciding the page numbers you
>> > are
>> > only considering rowkey.
>> >
>> > Thanks,
>> > Anil
>> >
>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
>> > [EMAIL PROTECTED]> wrote:
>> >
>> >> Hi Anil,
>> >>
>> >> I don't have a solution. I never tought about that ;) But I was
>> >> thinking about something like you create a 2nd table where you place
>> >> the raw number (4 bytes) then the raw key. You go directly to a
>> >> specific page, you query by the number, found the key, and you know
>> >> where to start you scan in the main table.
>> >>
>> >> The issue is properly the number for each lines since with a MR you
>> >> don't know where you are from the beginning. But you can built
>> >> something where you store the line number from the beginning of the
>> >> region, then when all regions are parsed you can reconstruct the total
>> >> numbering... That should work...
>> >>
>> >> JM
>> >>
>> >> 2013/1/25, anil gupta <[EMAIL PROTECTED]>:
>> >> > Inline...
>> >> >
>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
>> >> > [EMAIL PROTECTED]> wrote:
>> >> >
>> >> >> Hi Anil,
>> >> >>
>> >> >> The issue is that all the other sub-sequent page start should be
>> moved
>> >> >> too...
>> >> >>
>> >> > Yes, this is a possibility. Hence the Developer has to take care of
>> >> > this
>> >> > case. It might also be possible that the pageSize is not a hard
>> >> > limit
>> >> > on
>> >> > number of results(more like a hint or suggestion on size). I would
>> >> > say
>> >> > it
>> >> > varies by use case.
>> >> >
>> >> >>
>> >> >> so if you want to jump directly to page n, you might be totally
>> >> >> shifted because of all the data inserted in the meantime...
>> >> >>
>> >> >> If you want a real complete pagination feature, you might want to
>> have
>> >> >> a coproccessor or a MR updating another table refering to the
>> >> >> pages....
+
anil gupta 2013-01-30, 07:49
+
Mohammad Tariq 2013-01-30, 03:32
+
anil gupta 2013-01-30, 08:03
+
Anoop Sam John 2013-01-30, 11:31
+
Jean-Marc Spaggiari 2013-01-30, 12:18
+
Toby Lazar 2013-01-30, 12:42
+
Asaf Mesika 2013-02-03, 14:07
+
Anoop Sam John 2013-01-31, 03:23
+
anil gupta 2013-02-02, 08:02
+
Anoop John 2013-02-03, 16:07
+
anil gupta 2013-02-03, 17:21
+
Toby Lazar 2013-02-03, 17:25
+
anil gupta 2013-02-03, 17:39
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB