|
Vijay Ganesan
2013-01-25, 04:58
Mohammad Tariq
2013-01-25, 05:12
Jean-Marc Spaggiari
2013-01-25, 12:38
anil gupta
2013-01-25, 17:07
Jean-Marc Spaggiari
2013-01-25, 17:17
anil gupta
2013-01-25, 17:43
Jean-Marc Spaggiari
2013-01-26, 02:58
anil gupta
2013-01-28, 03:31
Jean-Marc Spaggiari
2013-01-29, 21:08
anil gupta
2013-01-29, 21:16
Jean-Marc Spaggiari
2013-01-29, 21:40
anil gupta
2013-01-30, 07:49
Mohammad Tariq
2013-01-30, 03:32
anil gupta
2013-01-30, 08:03
Anoop Sam John
2013-01-30, 11:31
Jean-Marc Spaggiari
2013-01-30, 12:18
Toby Lazar
2013-01-30, 12:42
Asaf Mesika
2013-02-03, 14:07
Anoop Sam John
2013-01-31, 03:23
anil gupta
2013-02-02, 08:02
Anoop John
2013-02-03, 16:07
anil gupta
2013-02-03, 17:21
Toby Lazar
2013-02-03, 17:25
anil gupta
2013-02-03, 17:39
|
-
Pagination with HBase - getting previous page of dataVijay Ganesan 2013-01-25, 04:58
I'm displaying rows of data from a HBase table in a data grid UI. The grid
shows 25 rows at a time i.e. it is paginated. User can click on Next/Previous to paginate through the data 25 rows at a time. I can implement Next easily by setting a HBase org.apache.hadoop.hbase.filter.PageFilter and setting startRow on the org.apache.hadoop.hbase.client.Scan to be the row id of the next batch's row that is sent to the UI with the previous batch. However, I can't seem to be able to do the same with Previous. I can set the endRow on the Scan to be the row id of the last row of the previous batch but since HBase Scans are always in the forward direction, there is no way to set a PageFilter that can get 25 rows ending at a particular row. The only option seems to be to get *all* rows up to the end row and filter out all but the last 25 in the caller, which seems very inefficient. Any ideas on how this can be done efficiently? -- -Vijay +
Vijay Ganesan 2013-01-25, 04:58
-
Re: Pagination with HBase - getting previous page of dataMohammad Tariq 2013-01-25, 05:12
Hello sir,
While paging through, store the startkey of the current page of 25 rows in a separate byte[]. Now, if you want to come back to this page when you are at the next page do a range query where startkey would be the rowkey you had stored earlier and the endkey would be the startrowkey of current page. You have to store just one rowkey each time you show a page using which you could come back to this page when you are at the next page. However, this approach will fail in a case where your user would like to go to a particular previous page. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan <[EMAIL PROTECTED]> wrote: > I'm displaying rows of data from a HBase table in a data grid UI. The grid > shows 25 rows at a time i.e. it is paginated. User can click on > Next/Previous to paginate through the data 25 rows at a time. I can > implement Next easily by setting a HBase > org.apache.hadoop.hbase.filter.PageFilter and setting startRow on the > org.apache.hadoop.hbase.client.Scan to be the row id of the next batch's > row that is sent to the UI with the previous batch. However, I can't seem > to be able to do the same with Previous. I can set the endRow on the Scan > to be the row id of the last row of the previous batch but since HBase > Scans are always in the forward direction, there is no way to set a > PageFilter that can get 25 rows ending at a particular row. The only option > seems to be to get *all* rows up to the end row and filter out all but the > last 25 in the caller, which seems very inefficient. Any ideas on how this > can be done efficiently? > > -- > -Vijay > +
Mohammad Tariq 2013-01-25, 05:12
-
Re: Pagination with HBase - getting previous page of dataJean-Marc Spaggiari 2013-01-25, 12:38
Hi Vijay,
If, while the user os scrolling forward, you store the key of each page, then you will be able to go back to a specific page, and jump forward back up to where he was. The only issue is that, if while the user is scrolling the table, someone insert a row between the last of a page, and the first of the next page, you will never see this row. Let's take this exemaple. You have 10 items per page. 010 020 030 040 050 060 070 080 090 100 is the first page. 110 120 130 140 150 160 170 180 190 200 is the second one. Now, if someone insert 101... If will be just after 100 and before 110. When you will display 10 rows starting at 010 you will stop just before 101... And for the next page you will start at 110... And 101 will never be displayed... HTH JM 2013/1/25, Mohammad Tariq <[EMAIL PROTECTED]>: > Hello sir, > > While paging through, store the startkey of the current page of 25 > rows > in a separate byte[]. Now, if you want to come back to this page when you > are at the next page do a range query where startkey would be the rowkey > you had stored earlier and the endkey would be the startrowkey of current > page. You have to store just one rowkey each time you show a page using > which you could come back to this page when you are at the next page. > > However, this approach will fail in a case where your user would like to go > to a particular previous page. > > Warm Regards, > Tariq > https://mtariq.jux.com/ > cloudfront.blogspot.com > > > On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan <[EMAIL PROTECTED]> > wrote: > >> I'm displaying rows of data from a HBase table in a data grid UI. The >> grid >> shows 25 rows at a time i.e. it is paginated. User can click on >> Next/Previous to paginate through the data 25 rows at a time. I can >> implement Next easily by setting a HBase >> org.apache.hadoop.hbase.filter.PageFilter and setting startRow on the >> org.apache.hadoop.hbase.client.Scan to be the row id of the next batch's >> row that is sent to the UI with the previous batch. However, I can't seem >> to be able to do the same with Previous. I can set the endRow on the Scan >> to be the row id of the last row of the previous batch but since HBase >> Scans are always in the forward direction, there is no way to set a >> PageFilter that can get 25 rows ending at a particular row. The only >> option >> seems to be to get *all* rows up to the end row and filter out all but >> the >> last 25 in the caller, which seems very inefficient. Any ideas on how >> this >> can be done efficiently? >> >> -- >> -Vijay >> > +
Jean-Marc Spaggiari 2013-01-25, 12:38
-
Re: Pagination with HBase - getting previous page of dataanil gupta 2013-01-25, 17:07
Hi Vijay,
I've done paging in HBase by using Scan only(no pagination filter) as Mohammed has explained. However it was just an experimental stuff. It works but Jean raised a very good point. Find my answer inline to fix the problem that Jean reported. On Fri, Jan 25, 2013 at 4:38 AM, Jean-Marc Spaggiari < [EMAIL PROTECTED]> wrote: > Hi Vijay, > > If, while the user os scrolling forward, you store the key of each > page, then you will be able to go back to a specific page, and jump > forward back up to where he was. > > The only issue is that, if while the user is scrolling the table, > someone insert a row between the last of a page, and the first of the > next page, you will never see this row. > > Let's take this exemaple. > > You have 10 items per page. > > 010 020 030 040 050 060 070 080 090 100 is the first page. > 110 120 130 140 150 160 170 180 190 200 is the second one. > > Now, if someone insert 101... If will be just after 100 and before 110. > Anil: Instead of scanning from 010 to 100, scan from 010 to 110. Then we wont have this problem. So, i mean to say that startRow(firstRowKeyofPage(N)) and stopRow(firstRowKeyofPage(N+1)). This would fix it. Also, in that case number of results might exceed the pageSize. So you might need to handle this logic. > > When you will display 10 rows starting at 010 you will stop just > before 101... And for the next page you will start at 110... And 101 > will never be displayed... > > HTH > > JM > > 2013/1/25, Mohammad Tariq <[EMAIL PROTECTED]>: > > Hello sir, > > > > While paging through, store the startkey of the current page of 25 > > rows > > in a separate byte[]. Now, if you want to come back to this page when you > > are at the next page do a range query where startkey would be the rowkey > > you had stored earlier and the endkey would be the startrowkey of > current > > page. You have to store just one rowkey each time you show a page using > > which you could come back to this page when you are at the next page. > > > > However, this approach will fail in a case where your user would like to > go > > to a particular previous page. > > > > Warm Regards, > > Tariq > > https://mtariq.jux.com/ > > cloudfront.blogspot.com > > > > > > On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan <[EMAIL PROTECTED]> > > wrote: > > > >> I'm displaying rows of data from a HBase table in a data grid UI. The > >> grid > >> shows 25 rows at a time i.e. it is paginated. User can click on > >> Next/Previous to paginate through the data 25 rows at a time. I can > >> implement Next easily by setting a HBase > >> org.apache.hadoop.hbase.filter.PageFilter and setting startRow on the > >> org.apache.hadoop.hbase.client.Scan to be the row id of the next batch's > >> row that is sent to the UI with the previous batch. However, I can't > seem > >> to be able to do the same with Previous. I can set the endRow on the > Scan > >> to be the row id of the last row of the previous batch but since HBase > >> Scans are always in the forward direction, there is no way to set a > >> PageFilter that can get 25 rows ending at a particular row. The only > >> option > >> seems to be to get *all* rows up to the end row and filter out all but > >> the > >> last 25 in the caller, which seems very inefficient. Any ideas on how > >> this > >> can be done efficiently? > >> > >> -- > >> -Vijay > >> > > > -- Thanks & Regards, Anil Gupta +
anil gupta 2013-01-25, 17:07
-
Re: Pagination with HBase - getting previous page of dataJean-Marc Spaggiari 2013-01-25, 17:17
Hi Anil,
The issue is that all the other sub-sequent page start should be moved too... so if you want to jump directly to page n, you might be totally shifted because of all the data inserted in the meantime... If you want a real complete pagination feature, you might want to have a coproccessor or a MR updating another table refering to the pages.... JM 2013/1/25, anil gupta <[EMAIL PROTECTED]>: > Hi Vijay, > > I've done paging in HBase by using Scan only(no pagination filter) as > Mohammed has explained. However it was just an experimental stuff. It works > but Jean raised a very good point. > Find my answer inline to fix the problem that Jean reported. > > > On Fri, Jan 25, 2013 at 4:38 AM, Jean-Marc Spaggiari < > [EMAIL PROTECTED]> wrote: > >> Hi Vijay, >> >> If, while the user os scrolling forward, you store the key of each >> page, then you will be able to go back to a specific page, and jump >> forward back up to where he was. >> >> The only issue is that, if while the user is scrolling the table, >> someone insert a row between the last of a page, and the first of the >> next page, you will never see this row. >> >> Let's take this exemaple. >> >> You have 10 items per page. >> >> 010 020 030 040 050 060 070 080 090 100 is the first page. >> 110 120 130 140 150 160 170 180 190 200 is the second one. >> >> Now, if someone insert 101... If will be just after 100 and before 110. >> > Anil: Instead of scanning from 010 to 100, scan from 010 to 110. Then we > wont have this problem. So, i mean to say that > startRow(firstRowKeyofPage(N)) and stopRow(firstRowKeyofPage(N+1)). This > would fix it. Also, in that case number of results might exceed the > pageSize. So you might need to handle this logic. > >> >> When you will display 10 rows starting at 010 you will stop just >> before 101... And for the next page you will start at 110... And 101 >> will never be displayed... >> >> HTH >> >> JM >> >> 2013/1/25, Mohammad Tariq <[EMAIL PROTECTED]>: >> > Hello sir, >> > >> > While paging through, store the startkey of the current page of >> > 25 >> > rows >> > in a separate byte[]. Now, if you want to come back to this page when >> > you >> > are at the next page do a range query where startkey would be the >> > rowkey >> > you had stored earlier and the endkey would be the startrowkey of >> current >> > page. You have to store just one rowkey each time you show a page using >> > which you could come back to this page when you are at the next page. >> > >> > However, this approach will fail in a case where your user would like >> > to >> go >> > to a particular previous page. >> > >> > Warm Regards, >> > Tariq >> > https://mtariq.jux.com/ >> > cloudfront.blogspot.com >> > >> > >> > On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan <[EMAIL PROTECTED]> >> > wrote: >> > >> >> I'm displaying rows of data from a HBase table in a data grid UI. The >> >> grid >> >> shows 25 rows at a time i.e. it is paginated. User can click on >> >> Next/Previous to paginate through the data 25 rows at a time. I can >> >> implement Next easily by setting a HBase >> >> org.apache.hadoop.hbase.filter.PageFilter and setting startRow on the >> >> org.apache.hadoop.hbase.client.Scan to be the row id of the next >> >> batch's >> >> row that is sent to the UI with the previous batch. However, I can't >> seem >> >> to be able to do the same with Previous. I can set the endRow on the >> Scan >> >> to be the row id of the last row of the previous batch but since HBase >> >> Scans are always in the forward direction, there is no way to set a >> >> PageFilter that can get 25 rows ending at a particular row. The only >> >> option >> >> seems to be to get *all* rows up to the end row and filter out all but >> >> the >> >> last 25 in the caller, which seems very inefficient. Any ideas on how >> >> this >> >> can be done efficiently? >> >> >> >> -- >> >> -Vijay >> >> >> > >> > > > > -- > Thanks & Regards, > Anil Gupta > +
Jean-Marc Spaggiari 2013-01-25, 17:17
-
Re: Pagination with HBase - getting previous page of dataanil gupta 2013-01-25, 17:43
Inline...
On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari < [EMAIL PROTECTED]> wrote: > Hi Anil, > > The issue is that all the other sub-sequent page start should be moved > too... > Yes, this is a possibility. Hence the Developer has to take care of this case. It might also be possible that the pageSize is not a hard limit on number of results(more like a hint or suggestion on size). I would say it varies by use case. > > so if you want to jump directly to page n, you might be totally > shifted because of all the data inserted in the meantime... > > If you want a real complete pagination feature, you might want to have > a coproccessor or a MR updating another table refering to the > pages.... > Well, the solution depends on the use case. I will be doing pagination in HBase for a restful service but till now i am unable to find any reason why this cant be done at application level. Are you suggesting to use MR for paging in HBase? If yes, how? How would you use another table for pagination?what would you store in the extra table? > > JM > > 2013/1/25, anil gupta <[EMAIL PROTECTED]>: > > Hi Vijay, > > > > I've done paging in HBase by using Scan only(no pagination filter) as > > Mohammed has explained. However it was just an experimental stuff. It > works > > but Jean raised a very good point. > > Find my answer inline to fix the problem that Jean reported. > > > > > > On Fri, Jan 25, 2013 at 4:38 AM, Jean-Marc Spaggiari < > > [EMAIL PROTECTED]> wrote: > > > >> Hi Vijay, > >> > >> If, while the user os scrolling forward, you store the key of each > >> page, then you will be able to go back to a specific page, and jump > >> forward back up to where he was. > >> > >> The only issue is that, if while the user is scrolling the table, > >> someone insert a row between the last of a page, and the first of the > >> next page, you will never see this row. > >> > >> Let's take this exemaple. > >> > >> You have 10 items per page. > >> > >> 010 020 030 040 050 060 070 080 090 100 is the first page. > >> 110 120 130 140 150 160 170 180 190 200 is the second one. > >> > >> Now, if someone insert 101... If will be just after 100 and before 110. > >> > > Anil: Instead of scanning from 010 to 100, scan from 010 to 110. Then we > > wont have this problem. So, i mean to say that > > startRow(firstRowKeyofPage(N)) and stopRow(firstRowKeyofPage(N+1)). This > > would fix it. Also, in that case number of results might exceed the > > pageSize. So you might need to handle this logic. > > > >> > >> When you will display 10 rows starting at 010 you will stop just > >> before 101... And for the next page you will start at 110... And 101 > >> will never be displayed... > >> > >> HTH > >> > >> JM > >> > >> 2013/1/25, Mohammad Tariq <[EMAIL PROTECTED]>: > >> > Hello sir, > >> > > >> > While paging through, store the startkey of the current page of > >> > 25 > >> > rows > >> > in a separate byte[]. Now, if you want to come back to this page when > >> > you > >> > are at the next page do a range query where startkey would be the > >> > rowkey > >> > you had stored earlier and the endkey would be the startrowkey of > >> current > >> > page. You have to store just one rowkey each time you show a page > using > >> > which you could come back to this page when you are at the next page. > >> > > >> > However, this approach will fail in a case where your user would like > >> > to > >> go > >> > to a particular previous page. > >> > > >> > Warm Regards, > >> > Tariq > >> > https://mtariq.jux.com/ > >> > cloudfront.blogspot.com > >> > > >> > > >> > On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan <[EMAIL PROTECTED]> > >> > wrote: > >> > > >> >> I'm displaying rows of data from a HBase table in a data grid UI. The > >> >> grid > >> >> shows 25 rows at a time i.e. it is paginated. User can click on > >> >> Next/Previous to paginate through the data 25 rows at a time. I can > >> >> implement Next easily by setting a HBase > >> >> org.apache.hadoop.hbase.filter.PageFilter and setting startRow on the Thanks & Regards, Anil Gupta +
anil gupta 2013-01-25, 17:43
-
Re: Pagination with HBase - getting previous page of dataJean-Marc Spaggiari 2013-01-26, 02:58
Hi Anil,
I don't have a solution. I never tought about that ;) But I was thinking about something like you create a 2nd table where you place the raw number (4 bytes) then the raw key. You go directly to a specific page, you query by the number, found the key, and you know where to start you scan in the main table. The issue is properly the number for each lines since with a MR you don't know where you are from the beginning. But you can built something where you store the line number from the beginning of the region, then when all regions are parsed you can reconstruct the total numbering... That should work... JM 2013/1/25, anil gupta <[EMAIL PROTECTED]>: > Inline... > > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari < > [EMAIL PROTECTED]> wrote: > >> Hi Anil, >> >> The issue is that all the other sub-sequent page start should be moved >> too... >> > Yes, this is a possibility. Hence the Developer has to take care of this > case. It might also be possible that the pageSize is not a hard limit on > number of results(more like a hint or suggestion on size). I would say it > varies by use case. > >> >> so if you want to jump directly to page n, you might be totally >> shifted because of all the data inserted in the meantime... >> >> If you want a real complete pagination feature, you might want to have >> a coproccessor or a MR updating another table refering to the >> pages.... >> > Well, the solution depends on the use case. I will be doing pagination in > HBase for a restful service but till now i am unable to find any reason why > this cant be done at application level. > Are you suggesting to use MR for paging in HBase? If yes, how? > How would you use another table for pagination?what would you store in the > extra table? > >> >> JM >> >> 2013/1/25, anil gupta <[EMAIL PROTECTED]>: >> > Hi Vijay, >> > >> > I've done paging in HBase by using Scan only(no pagination filter) as >> > Mohammed has explained. However it was just an experimental stuff. It >> works >> > but Jean raised a very good point. >> > Find my answer inline to fix the problem that Jean reported. >> > >> > >> > On Fri, Jan 25, 2013 at 4:38 AM, Jean-Marc Spaggiari < >> > [EMAIL PROTECTED]> wrote: >> > >> >> Hi Vijay, >> >> >> >> If, while the user os scrolling forward, you store the key of each >> >> page, then you will be able to go back to a specific page, and jump >> >> forward back up to where he was. >> >> >> >> The only issue is that, if while the user is scrolling the table, >> >> someone insert a row between the last of a page, and the first of the >> >> next page, you will never see this row. >> >> >> >> Let's take this exemaple. >> >> >> >> You have 10 items per page. >> >> >> >> 010 020 030 040 050 060 070 080 090 100 is the first page. >> >> 110 120 130 140 150 160 170 180 190 200 is the second one. >> >> >> >> Now, if someone insert 101... If will be just after 100 and before >> >> 110. >> >> >> > Anil: Instead of scanning from 010 to 100, scan from 010 to 110. Then >> > we >> > wont have this problem. So, i mean to say that >> > startRow(firstRowKeyofPage(N)) and stopRow(firstRowKeyofPage(N+1)). >> > This >> > would fix it. Also, in that case number of results might exceed the >> > pageSize. So you might need to handle this logic. >> > >> >> >> >> When you will display 10 rows starting at 010 you will stop just >> >> before 101... And for the next page you will start at 110... And 101 >> >> will never be displayed... >> >> >> >> HTH >> >> >> >> JM >> >> >> >> 2013/1/25, Mohammad Tariq <[EMAIL PROTECTED]>: >> >> > Hello sir, >> >> > >> >> > While paging through, store the startkey of the current page >> >> > of >> >> > 25 >> >> > rows >> >> > in a separate byte[]. Now, if you want to come back to this page >> >> > when >> >> > you >> >> > are at the next page do a range query where startkey would be the >> >> > rowkey >> >> > you had stored earlier and the endkey would be the startrowkey of >> >> current >> >> > page. You have to store just one rowkey each time you show a page +
Jean-Marc Spaggiari 2013-01-26, 02:58
-
Re: Pagination with HBase - getting previous page of dataanil gupta 2013-01-28, 03:31
That's alright..I thought that you have come-up with a killer solution. So,
got curious to hear your ideas. ;) It seems like your below mentioned solution will not work on filtering on non row-key columns since when you are deciding the page numbers you are only considering rowkey. Thanks, Anil On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari < [EMAIL PROTECTED]> wrote: > Hi Anil, > > I don't have a solution. I never tought about that ;) But I was > thinking about something like you create a 2nd table where you place > the raw number (4 bytes) then the raw key. You go directly to a > specific page, you query by the number, found the key, and you know > where to start you scan in the main table. > > The issue is properly the number for each lines since with a MR you > don't know where you are from the beginning. But you can built > something where you store the line number from the beginning of the > region, then when all regions are parsed you can reconstruct the total > numbering... That should work... > > JM > > 2013/1/25, anil gupta <[EMAIL PROTECTED]>: > > Inline... > > > > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari < > > [EMAIL PROTECTED]> wrote: > > > >> Hi Anil, > >> > >> The issue is that all the other sub-sequent page start should be moved > >> too... > >> > > Yes, this is a possibility. Hence the Developer has to take care of this > > case. It might also be possible that the pageSize is not a hard limit on > > number of results(more like a hint or suggestion on size). I would say it > > varies by use case. > > > >> > >> so if you want to jump directly to page n, you might be totally > >> shifted because of all the data inserted in the meantime... > >> > >> If you want a real complete pagination feature, you might want to have > >> a coproccessor or a MR updating another table refering to the > >> pages.... > >> > > Well, the solution depends on the use case. I will be doing pagination in > > HBase for a restful service but till now i am unable to find any reason > why > > this cant be done at application level. > > Are you suggesting to use MR for paging in HBase? If yes, how? > > How would you use another table for pagination?what would you store in > the > > extra table? > > > >> > >> JM > >> > >> 2013/1/25, anil gupta <[EMAIL PROTECTED]>: > >> > Hi Vijay, > >> > > >> > I've done paging in HBase by using Scan only(no pagination filter) as > >> > Mohammed has explained. However it was just an experimental stuff. It > >> works > >> > but Jean raised a very good point. > >> > Find my answer inline to fix the problem that Jean reported. > >> > > >> > > >> > On Fri, Jan 25, 2013 at 4:38 AM, Jean-Marc Spaggiari < > >> > [EMAIL PROTECTED]> wrote: > >> > > >> >> Hi Vijay, > >> >> > >> >> If, while the user os scrolling forward, you store the key of each > >> >> page, then you will be able to go back to a specific page, and jump > >> >> forward back up to where he was. > >> >> > >> >> The only issue is that, if while the user is scrolling the table, > >> >> someone insert a row between the last of a page, and the first of the > >> >> next page, you will never see this row. > >> >> > >> >> Let's take this exemaple. > >> >> > >> >> You have 10 items per page. > >> >> > >> >> 010 020 030 040 050 060 070 080 090 100 is the first page. > >> >> 110 120 130 140 150 160 170 180 190 200 is the second one. > >> >> > >> >> Now, if someone insert 101... If will be just after 100 and before > >> >> 110. > >> >> > >> > Anil: Instead of scanning from 010 to 100, scan from 010 to 110. Then > >> > we > >> > wont have this problem. So, i mean to say that > >> > startRow(firstRowKeyofPage(N)) and stopRow(firstRowKeyofPage(N+1)). > >> > This > >> > would fix it. Also, in that case number of results might exceed the > >> > pageSize. So you might need to handle this logic. > >> > > >> >> > >> >> When you will display 10 rows starting at 010 you will stop just > >> >> before 101... And for the next page you will start at 110... And 101 Thanks & Regards, Anil Gupta +
anil gupta 2013-01-28, 03:31
-
Re: Pagination with HBase - getting previous page of dataJean-Marc Spaggiari 2013-01-29, 21:08
No, no killer solution here ;)
But I'm still thinking about that because I might have to implement some pagination options soon... As you are saying, it's only working on the row-key, but if you want to do the same-thing on non-rowkey, you might have to create a secondary index table... JM 2013/1/27, anil gupta <[EMAIL PROTECTED]>: > That's alright..I thought that you have come-up with a killer solution. So, > got curious to hear your ideas. ;) > It seems like your below mentioned solution will not work on filtering on > non row-key columns since when you are deciding the page numbers you are > only considering rowkey. > > Thanks, > Anil > > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari < > [EMAIL PROTECTED]> wrote: > >> Hi Anil, >> >> I don't have a solution. I never tought about that ;) But I was >> thinking about something like you create a 2nd table where you place >> the raw number (4 bytes) then the raw key. You go directly to a >> specific page, you query by the number, found the key, and you know >> where to start you scan in the main table. >> >> The issue is properly the number for each lines since with a MR you >> don't know where you are from the beginning. But you can built >> something where you store the line number from the beginning of the >> region, then when all regions are parsed you can reconstruct the total >> numbering... That should work... >> >> JM >> >> 2013/1/25, anil gupta <[EMAIL PROTECTED]>: >> > Inline... >> > >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari < >> > [EMAIL PROTECTED]> wrote: >> > >> >> Hi Anil, >> >> >> >> The issue is that all the other sub-sequent page start should be moved >> >> too... >> >> >> > Yes, this is a possibility. Hence the Developer has to take care of >> > this >> > case. It might also be possible that the pageSize is not a hard limit >> > on >> > number of results(more like a hint or suggestion on size). I would say >> > it >> > varies by use case. >> > >> >> >> >> so if you want to jump directly to page n, you might be totally >> >> shifted because of all the data inserted in the meantime... >> >> >> >> If you want a real complete pagination feature, you might want to have >> >> a coproccessor or a MR updating another table refering to the >> >> pages.... >> >> >> > Well, the solution depends on the use case. I will be doing pagination >> > in >> > HBase for a restful service but till now i am unable to find any reason >> why >> > this cant be done at application level. >> > Are you suggesting to use MR for paging in HBase? If yes, how? >> > How would you use another table for pagination?what would you store in >> the >> > extra table? >> > >> >> >> >> JM >> >> >> >> 2013/1/25, anil gupta <[EMAIL PROTECTED]>: >> >> > Hi Vijay, >> >> > >> >> > I've done paging in HBase by using Scan only(no pagination filter) >> >> > as >> >> > Mohammed has explained. However it was just an experimental stuff. >> >> > It >> >> works >> >> > but Jean raised a very good point. >> >> > Find my answer inline to fix the problem that Jean reported. >> >> > >> >> > >> >> > On Fri, Jan 25, 2013 at 4:38 AM, Jean-Marc Spaggiari < >> >> > [EMAIL PROTECTED]> wrote: >> >> > >> >> >> Hi Vijay, >> >> >> >> >> >> If, while the user os scrolling forward, you store the key of each >> >> >> page, then you will be able to go back to a specific page, and jump >> >> >> forward back up to where he was. >> >> >> >> >> >> The only issue is that, if while the user is scrolling the table, >> >> >> someone insert a row between the last of a page, and the first of >> >> >> the >> >> >> next page, you will never see this row. >> >> >> >> >> >> Let's take this exemaple. >> >> >> >> >> >> You have 10 items per page. >> >> >> >> >> >> 010 020 030 040 050 060 070 080 090 100 is the first page. >> >> >> 110 120 130 140 150 160 170 180 190 200 is the second one. >> >> >> >> >> >> Now, if someone insert 101... If will be just after 100 and before >> >> >> 110. >> >> >> +
Jean-Marc Spaggiari 2013-01-29, 21:08
-
Re: Pagination with HBase - getting previous page of dataanil gupta 2013-01-29, 21:16
Yes, your suggested solution only works on RowKey based pagination. It will
fail when you start filtering on the basis of columns. Still, i would say it's comparatively easier to maintain this at Application level rather than creating tables for pagination. What if you have 300 columns in your schema. Will you create 300 tables? What about handling of pagination when filtering is done based on multiple columns ("and" and "or" conditions)? On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari < [EMAIL PROTECTED]> wrote: > No, no killer solution here ;) > > But I'm still thinking about that because I might have to implement > some pagination options soon... > > As you are saying, it's only working on the row-key, but if you want > to do the same-thing on non-rowkey, you might have to create a > secondary index table... > > JM > > 2013/1/27, anil gupta <[EMAIL PROTECTED]>: > > That's alright..I thought that you have come-up with a killer solution. > So, > > got curious to hear your ideas. ;) > > It seems like your below mentioned solution will not work on filtering on > > non row-key columns since when you are deciding the page numbers you are > > only considering rowkey. > > > > Thanks, > > Anil > > > > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari < > > [EMAIL PROTECTED]> wrote: > > > >> Hi Anil, > >> > >> I don't have a solution. I never tought about that ;) But I was > >> thinking about something like you create a 2nd table where you place > >> the raw number (4 bytes) then the raw key. You go directly to a > >> specific page, you query by the number, found the key, and you know > >> where to start you scan in the main table. > >> > >> The issue is properly the number for each lines since with a MR you > >> don't know where you are from the beginning. But you can built > >> something where you store the line number from the beginning of the > >> region, then when all regions are parsed you can reconstruct the total > >> numbering... That should work... > >> > >> JM > >> > >> 2013/1/25, anil gupta <[EMAIL PROTECTED]>: > >> > Inline... > >> > > >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari < > >> > [EMAIL PROTECTED]> wrote: > >> > > >> >> Hi Anil, > >> >> > >> >> The issue is that all the other sub-sequent page start should be > moved > >> >> too... > >> >> > >> > Yes, this is a possibility. Hence the Developer has to take care of > >> > this > >> > case. It might also be possible that the pageSize is not a hard limit > >> > on > >> > number of results(more like a hint or suggestion on size). I would say > >> > it > >> > varies by use case. > >> > > >> >> > >> >> so if you want to jump directly to page n, you might be totally > >> >> shifted because of all the data inserted in the meantime... > >> >> > >> >> If you want a real complete pagination feature, you might want to > have > >> >> a coproccessor or a MR updating another table refering to the > >> >> pages.... > >> >> > >> > Well, the solution depends on the use case. I will be doing pagination > >> > in > >> > HBase for a restful service but till now i am unable to find any > reason > >> why > >> > this cant be done at application level. > >> > Are you suggesting to use MR for paging in HBase? If yes, how? > >> > How would you use another table for pagination?what would you store in > >> the > >> > extra table? > >> > > >> >> > >> >> JM > >> >> > >> >> 2013/1/25, anil gupta <[EMAIL PROTECTED]>: > >> >> > Hi Vijay, > >> >> > > >> >> > I've done paging in HBase by using Scan only(no pagination filter) > >> >> > as > >> >> > Mohammed has explained. However it was just an experimental stuff. > >> >> > It > >> >> works > >> >> > but Jean raised a very good point. > >> >> > Find my answer inline to fix the problem that Jean reported. > >> >> > > >> >> > > >> >> > On Fri, Jan 25, 2013 at 4:38 AM, Jean-Marc Spaggiari < > >> >> > [EMAIL PROTECTED]> wrote: > >> >> > > >> >> >> Hi Vijay, > >> >> >> > >> >> >> If, while the user os scrolling forward, you store the key of each Thanks & Regards, Anil Gupta +
anil gupta 2013-01-29, 21:16
-
Re: Pagination with HBase - getting previous page of dataJean-Marc Spaggiari 2013-01-29, 21:40
Hi Anil,
I think it really depend on the way you want to use the pagination. Do you need to be able to jump to page X? Are you ok if you miss a line or 2? Is your data growing fastly? Or slowly? Is it ok if your page indexes are a day old? Do you need to paginate over 300 colums? Or just 1? Do you need to always have the exact same number of entries in each page? For my usecase I need to be able to jump to the page X and I don't have any content. I have hundred of millions lines. Only the rowkey matter for me and I'm fine if sometime I have 50 entries displayed, and sometime only 45. So I'm thinking about calculating which row is the first one for each page, and store that separatly. Then I just need to run the MR daily. It's not a perfect solution I agree, but this might do the job for me. I'm totally open to all other idea which might do the job to. JM 2013/1/29, anil gupta <[EMAIL PROTECTED]>: > Yes, your suggested solution only works on RowKey based pagination. It will > fail when you start filtering on the basis of columns. > > Still, i would say it's comparatively easier to maintain this at > Application level rather than creating tables for pagination. > > What if you have 300 columns in your schema. Will you create 300 tables? > What about handling of pagination when filtering is done based on multiple > columns ("and" and "or" conditions)? > > On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari < > [EMAIL PROTECTED]> wrote: > >> No, no killer solution here ;) >> >> But I'm still thinking about that because I might have to implement >> some pagination options soon... >> >> As you are saying, it's only working on the row-key, but if you want >> to do the same-thing on non-rowkey, you might have to create a >> secondary index table... >> >> JM >> >> 2013/1/27, anil gupta <[EMAIL PROTECTED]>: >> > That's alright..I thought that you have come-up with a killer solution. >> So, >> > got curious to hear your ideas. ;) >> > It seems like your below mentioned solution will not work on filtering >> > on >> > non row-key columns since when you are deciding the page numbers you >> > are >> > only considering rowkey. >> > >> > Thanks, >> > Anil >> > >> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari < >> > [EMAIL PROTECTED]> wrote: >> > >> >> Hi Anil, >> >> >> >> I don't have a solution. I never tought about that ;) But I was >> >> thinking about something like you create a 2nd table where you place >> >> the raw number (4 bytes) then the raw key. You go directly to a >> >> specific page, you query by the number, found the key, and you know >> >> where to start you scan in the main table. >> >> >> >> The issue is properly the number for each lines since with a MR you >> >> don't know where you are from the beginning. But you can built >> >> something where you store the line number from the beginning of the >> >> region, then when all regions are parsed you can reconstruct the total >> >> numbering... That should work... >> >> >> >> JM >> >> >> >> 2013/1/25, anil gupta <[EMAIL PROTECTED]>: >> >> > Inline... >> >> > >> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari < >> >> > [EMAIL PROTECTED]> wrote: >> >> > >> >> >> Hi Anil, >> >> >> >> >> >> The issue is that all the other sub-sequent page start should be >> moved >> >> >> too... >> >> >> >> >> > Yes, this is a possibility. Hence the Developer has to take care of >> >> > this >> >> > case. It might also be possible that the pageSize is not a hard >> >> > limit >> >> > on >> >> > number of results(more like a hint or suggestion on size). I would >> >> > say >> >> > it >> >> > varies by use case. >> >> > >> >> >> >> >> >> so if you want to jump directly to page n, you might be totally >> >> >> shifted because of all the data inserted in the meantime... >> >> >> >> >> >> If you want a real complete pagination feature, you might want to >> have >> >> >> a coproccessor or a MR updating another table refering to the >> >> >> pages.... +
Jean-Marc Spaggiari 2013-01-29, 21:40
-
Re: Pagination with HBase - getting previous page of dataanil gupta 2013-01-30, 07:49
Hi Jean,
Please find my reply inline. On Tue, Jan 29, 2013 at 1:40 PM, Jean-Marc Spaggiari < [EMAIL PROTECTED]> wrote: > Hi Anil, > > I think it really depend on the way you want to use the pagination. > Absolutely true! > > Do you need to be able to jump to page X? Are you ok if you miss a > line or 2? Is your data growing fastly? Or slowly? Is it ok if your > page indexes are a day old? Do you need to paginate over 300 colums? > Or just 1? Do you need to always have the exact same number of entries > in each page? > No, i dont need to be able to jump page X. I dont think that missing lines will be acceptable. I need to filter the rows on non-rowkey attributes. It wont be ok if my page indexes are 1 day old. I need to paginate on basis of various filters based on columns or(and) rowkey. So, the number of combinations are quite large. > > For my usecase I need to be able to jump to the page X and I don't > have any content. I have hundred of millions lines. Only the rowkey > matter for me and I'm fine if sometime I have 50 entries displayed, > and sometime only 45. So I'm thinking about calculating which row is > the first one for each page, and store that separatly. Then I just > need to run the MR daily. > hmm..yeah, it might work for you. > > It's not a perfect solution I agree, but this might do the job for me. > I'm totally open to all other idea which might do the job to. > There is nothing like a "perfect" solution. If the implementation is able to fulfill your business needs, then go for it. > > JM > > 2013/1/29, anil gupta <[EMAIL PROTECTED]>: > > Yes, your suggested solution only works on RowKey based pagination. It > will > > fail when you start filtering on the basis of columns. > > > > Still, i would say it's comparatively easier to maintain this at > > Application level rather than creating tables for pagination. > > > > What if you have 300 columns in your schema. Will you create 300 tables? > > What about handling of pagination when filtering is done based on > multiple > > columns ("and" and "or" conditions)? > > > > On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari < > > [EMAIL PROTECTED]> wrote: > > > >> No, no killer solution here ;) > >> > >> But I'm still thinking about that because I might have to implement > >> some pagination options soon... > >> > >> As you are saying, it's only working on the row-key, but if you want > >> to do the same-thing on non-rowkey, you might have to create a > >> secondary index table... > >> > >> JM > >> > >> 2013/1/27, anil gupta <[EMAIL PROTECTED]>: > >> > That's alright..I thought that you have come-up with a killer > solution. > >> So, > >> > got curious to hear your ideas. ;) > >> > It seems like your below mentioned solution will not work on filtering > >> > on > >> > non row-key columns since when you are deciding the page numbers you > >> > are > >> > only considering rowkey. > >> > > >> > Thanks, > >> > Anil > >> > > >> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari < > >> > [EMAIL PROTECTED]> wrote: > >> > > >> >> Hi Anil, > >> >> > >> >> I don't have a solution. I never tought about that ;) But I was > >> >> thinking about something like you create a 2nd table where you place > >> >> the raw number (4 bytes) then the raw key. You go directly to a > >> >> specific page, you query by the number, found the key, and you know > >> >> where to start you scan in the main table. > >> >> > >> >> The issue is properly the number for each lines since with a MR you > >> >> don't know where you are from the beginning. But you can built > >> >> something where you store the line number from the beginning of the > >> >> region, then when all regions are parsed you can reconstruct the > total > >> >> numbering... That should work... > >> >> > >> >> JM > >> >> > >> >> 2013/1/25, anil gupta <[EMAIL PROTECTED]>: > >> >> > Inline... > >> >> > > >> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari < > >> >> > [EMAIL PROTECTED]> wrote: Thanks & Regards, Anil Gupta +
anil gupta 2013-01-30, 07:49
-
Re: Pagination with HBase - getting previous page of dataMohammad Tariq 2013-01-30, 03:32
I'm kinda hesitant to put my leg in between the pros ;)But, does it sound
sane to use PageFilter for both rows and columns and having some additional logic to handle the 'nth' page logic?It'll help us in both kind of paging. On Wednesday, January 30, 2013, Jean-Marc Spaggiari <[EMAIL PROTECTED]> wrote: > Hi Anil, > > I think it really depend on the way you want to use the pagination. > > Do you need to be able to jump to page X? Are you ok if you miss a > line or 2? Is your data growing fastly? Or slowly? Is it ok if your > page indexes are a day old? Do you need to paginate over 300 colums? > Or just 1? Do you need to always have the exact same number of entries > in each page? > > For my usecase I need to be able to jump to the page X and I don't > have any content. I have hundred of millions lines. Only the rowkey > matter for me and I'm fine if sometime I have 50 entries displayed, > and sometime only 45. So I'm thinking about calculating which row is > the first one for each page, and store that separatly. Then I just > need to run the MR daily. > > It's not a perfect solution I agree, but this might do the job for me. > I'm totally open to all other idea which might do the job to. > > JM > > 2013/1/29, anil gupta <[EMAIL PROTECTED]>: >> Yes, your suggested solution only works on RowKey based pagination. It will >> fail when you start filtering on the basis of columns. >> >> Still, i would say it's comparatively easier to maintain this at >> Application level rather than creating tables for pagination. >> >> What if you have 300 columns in your schema. Will you create 300 tables? >> What about handling of pagination when filtering is done based on multiple >> columns ("and" and "or" conditions)? >> >> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari < >> [EMAIL PROTECTED]> wrote: >> >>> No, no killer solution here ;) >>> >>> But I'm still thinking about that because I might have to implement >>> some pagination options soon... >>> >>> As you are saying, it's only working on the row-key, but if you want >>> to do the same-thing on non-rowkey, you might have to create a >>> secondary index table... >>> >>> JM >>> >>> 2013/1/27, anil gupta <[EMAIL PROTECTED]>: >>> > That's alright..I thought that you have come-up with a killer solution. >>> So, >>> > got curious to hear your ideas. ;) >>> > It seems like your below mentioned solution will not work on filtering >>> > on >>> > non row-key columns since when you are deciding the page numbers you >>> > are >>> > only considering rowkey. >>> > >>> > Thanks, >>> > Anil >>> > >>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari < >>> > [EMAIL PROTECTED]> wrote: >>> > >>> >> Hi Anil, >>> >> >>> >> I don't have a solution. I never tought about that ;) But I was >>> >> thinking about something like you create a 2nd table where you place >>> >> the raw number (4 bytes) then the raw key. You go directly to a >>> >> specific page, you query by the number, found the key, and you know >>> >> where to start you scan in the main table. >>> >> >>> >> The issue is properly the number for each lines since with a MR you >>> >> don't know where you are from the beginning. But you can built >>> >> something where you store the line number from the beginning of the >>> >> region, then when all regions are parsed you can reconstruct the total >>> >> numbering... That should work... >>> >> >>> >> JM >>> >> >>> >> 2013/1/25, anil gupta <[EMAIL PROTECTED]>: >>> >> > Inline... >>> >> > >>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari < >>> >> > [EMAIL PROTECTED]> wrote: >>> >> > >>> >> >> Hi Anil, >>> >> >> >>> >> >> The issue is that all the other sub-sequent page start should be >>> moved >>> >> >> too... >>> >> >> >>> >> > Yes, this is a possibility. Hence the Developer has to take care of >>> >> > this >>> >> > case. It might also be possible that the pageSize is not a hard >>> >> > limit >>> >> > on >>> >> > number of results(more like a hint or suggestion on size). I would Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com +
Mohammad Tariq 2013-01-30, 03:32
-
Re: Pagination with HBase - getting previous page of dataanil gupta 2013-01-30, 08:03
Hi Mohammad,
You are most welcome to join the discussion. I have never used PageFilter so i don't really have concrete input. I had a look at http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html I could not understand that why it goes to multiple regionservers in parallel. Why it cannot guarantee results <= page size( my guess: due to multiple RS scans)? If you have used it then maybe you can explain the behaviour? Thanks, Anil On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > I'm kinda hesitant to put my leg in between the pros ;)But, does it sound > sane to use PageFilter for both rows and columns and having some additional > logic to handle the 'nth' page logic?It'll help us in both kind of paging. > > On Wednesday, January 30, 2013, Jean-Marc Spaggiari < > [EMAIL PROTECTED]> > wrote: > > Hi Anil, > > > > I think it really depend on the way you want to use the pagination. > > > > Do you need to be able to jump to page X? Are you ok if you miss a > > line or 2? Is your data growing fastly? Or slowly? Is it ok if your > > page indexes are a day old? Do you need to paginate over 300 colums? > > Or just 1? Do you need to always have the exact same number of entries > > in each page? > > > > For my usecase I need to be able to jump to the page X and I don't > > have any content. I have hundred of millions lines. Only the rowkey > > matter for me and I'm fine if sometime I have 50 entries displayed, > > and sometime only 45. So I'm thinking about calculating which row is > > the first one for each page, and store that separatly. Then I just > > need to run the MR daily. > > > > It's not a perfect solution I agree, but this might do the job for me. > > I'm totally open to all other idea which might do the job to. > > > > JM > > > > 2013/1/29, anil gupta <[EMAIL PROTECTED]>: > >> Yes, your suggested solution only works on RowKey based pagination. It > will > >> fail when you start filtering on the basis of columns. > >> > >> Still, i would say it's comparatively easier to maintain this at > >> Application level rather than creating tables for pagination. > >> > >> What if you have 300 columns in your schema. Will you create 300 tables? > >> What about handling of pagination when filtering is done based on > multiple > >> columns ("and" and "or" conditions)? > >> > >> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari < > >> [EMAIL PROTECTED]> wrote: > >> > >>> No, no killer solution here ;) > >>> > >>> But I'm still thinking about that because I might have to implement > >>> some pagination options soon... > >>> > >>> As you are saying, it's only working on the row-key, but if you want > >>> to do the same-thing on non-rowkey, you might have to create a > >>> secondary index table... > >>> > >>> JM > >>> > >>> 2013/1/27, anil gupta <[EMAIL PROTECTED]>: > >>> > That's alright..I thought that you have come-up with a killer > solution. > >>> So, > >>> > got curious to hear your ideas. ;) > >>> > It seems like your below mentioned solution will not work on > filtering > >>> > on > >>> > non row-key columns since when you are deciding the page numbers you > >>> > are > >>> > only considering rowkey. > >>> > > >>> > Thanks, > >>> > Anil > >>> > > >>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari < > >>> > [EMAIL PROTECTED]> wrote: > >>> > > >>> >> Hi Anil, > >>> >> > >>> >> I don't have a solution. I never tought about that ;) But I was > >>> >> thinking about something like you create a 2nd table where you place > >>> >> the raw number (4 bytes) then the raw key. You go directly to a > >>> >> specific page, you query by the number, found the key, and you know > >>> >> where to start you scan in the main table. > >>> >> > >>> >> The issue is properly the number for each lines since with a MR you > >>> >> don't know where you are from the beginning. But you can built > >>> >> something where you store the line number from the beginning of the > Thanks & Regards, Anil Gupta +
anil gupta 2013-01-30, 08:03
-
RE: Pagination with HBase - getting previous page of dataAnoop Sam John 2013-01-30, 11:31
@Anil
>I could not understand that why it goes to multiple regionservers in parallel. Why it cannot guarantee results <= page size( my guess: due to multiple RS scans)? If you have used it then maybe you can explain the behaviour? Scan from client side never go to multiple RS in parallel. Scan from HTable API will be sequential with one region after the other. For every region it will open up scanner in the RS and do next() calls. The filter will be instantiated at server side per region level ... When u need 100 rows in the page and you created a Scan at client side with the filter and suppose there are 2 regions, 1st the scanner is opened at for region1 and scan is happening. It will ensure that max 100 rows will be retrieved from that region. But when the region boundary crosses and client automatically open up scanner for the region2, there also it will pass filter with max 100 rows and so from there also max 100 rows can come.. So over all at the client side we can not guartee that the scan created will only scan 100 rows as a whole from the table. I think I am making it clear. I have not PageFilter at all.. I am just explaining as per the knowledge on scan flow and the general filter usage. "This is because the filter is applied separately on different region servers. It does however optimize the scan of individual HRegions by making sure that the page size is never exceeded locally. " I guess it need to be saying that "This is because the filter is applied separately on different regions". -Anoop- ________________________________________ From: anil gupta [[EMAIL PROTECTED]] Sent: Wednesday, January 30, 2013 1:33 PM To: [EMAIL PROTECTED] Subject: Re: Pagination with HBase - getting previous page of data Hi Mohammad, You are most welcome to join the discussion. I have never used PageFilter so i don't really have concrete input. I had a look at http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html I could not understand that why it goes to multiple regionservers in parallel. Why it cannot guarantee results <= page size( my guess: due to multiple RS scans)? If you have used it then maybe you can explain the behaviour? Thanks, Anil On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > I'm kinda hesitant to put my leg in between the pros ;)But, does it sound > sane to use PageFilter for both rows and columns and having some additional > logic to handle the 'nth' page logic?It'll help us in both kind of paging. > > On Wednesday, January 30, 2013, Jean-Marc Spaggiari < > [EMAIL PROTECTED]> > wrote: > > Hi Anil, > > > > I think it really depend on the way you want to use the pagination. > > > > Do you need to be able to jump to page X? Are you ok if you miss a > > line or 2? Is your data growing fastly? Or slowly? Is it ok if your > > page indexes are a day old? Do you need to paginate over 300 colums? > > Or just 1? Do you need to always have the exact same number of entries > > in each page? > > > > For my usecase I need to be able to jump to the page X and I don't > > have any content. I have hundred of millions lines. Only the rowkey > > matter for me and I'm fine if sometime I have 50 entries displayed, > > and sometime only 45. So I'm thinking about calculating which row is > > the first one for each page, and store that separatly. Then I just > > need to run the MR daily. > > > > It's not a perfect solution I agree, but this might do the job for me. > > I'm totally open to all other idea which might do the job to. > > > > JM > > > > 2013/1/29, anil gupta <[EMAIL PROTECTED]>: > >> Yes, your suggested solution only works on RowKey based pagination. It > will > >> fail when you start filtering on the basis of columns. > >> > >> Still, i would say it's comparatively easier to maintain this at > >> Application level rather than creating tables for pagination. > >> > >> What if you have 300 columns in your schema. Will you create 300 tables? > >> What about handling of pagination when filtering is done based on Thanks & Regards, Anil Gupta +
Anoop Sam John 2013-01-30, 11:31
-
Re: Pagination with HBase - getting previous page of dataJean-Marc Spaggiari 2013-01-30, 12:18
Hi Anoop,
So does it mean the scanner can send back LIMIT*2-1 lines max? Reading 100 rows from the 2nd region is using extra time and resources. Why not ask for only the number of missing lines? JM 2013/1/30, Anoop Sam John <[EMAIL PROTECTED]>: > @Anil > >>I could not understand that why it goes to multiple regionservers in > parallel. Why it cannot guarantee results <= page size( my guess: due to > multiple RS scans)? If you have used it then maybe you can explain the > behaviour? > > Scan from client side never go to multiple RS in parallel. Scan from HTable > API will be sequential with one region after the other. For every region it > will open up scanner in the RS and do next() calls. The filter will be > instantiated at server side per region level ... > > When u need 100 rows in the page and you created a Scan at client side with > the filter and suppose there are 2 regions, 1st the scanner is opened at for > region1 and scan is happening. It will ensure that max 100 rows will be > retrieved from that region. But when the region boundary crosses and client > automatically open up scanner for the region2, there also it will pass > filter with max 100 rows and so from there also max 100 rows can come.. So > over all at the client side we can not guartee that the scan created will > only scan 100 rows as a whole from the table. > > I think I am making it clear. I have not PageFilter at all.. I am just > explaining as per the knowledge on scan flow and the general filter usage. > > "This is because the filter is applied separately on different region > servers. It does however optimize the scan of individual HRegions by making > sure that the page size is never exceeded locally. " > > I guess it need to be saying that "This is because the filter is applied > separately on different regions". > > -Anoop- > > ________________________________________ > From: anil gupta [[EMAIL PROTECTED]] > Sent: Wednesday, January 30, 2013 1:33 PM > To: [EMAIL PROTECTED] > Subject: Re: Pagination with HBase - getting previous page of data > > Hi Mohammad, > > You are most welcome to join the discussion. I have never used PageFilter > so i don't really have concrete input. > I had a look at > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html > I could not understand that why it goes to multiple regionservers in > parallel. Why it cannot guarantee results <= page size( my guess: due to > multiple RS scans)? If you have used it then maybe you can explain the > behaviour? > > Thanks, > Anil > > On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > >> I'm kinda hesitant to put my leg in between the pros ;)But, does it sound >> sane to use PageFilter for both rows and columns and having some >> additional >> logic to handle the 'nth' page logic?It'll help us in both kind of >> paging. >> >> On Wednesday, January 30, 2013, Jean-Marc Spaggiari < >> [EMAIL PROTECTED]> >> wrote: >> > Hi Anil, >> > >> > I think it really depend on the way you want to use the pagination. >> > >> > Do you need to be able to jump to page X? Are you ok if you miss a >> > line or 2? Is your data growing fastly? Or slowly? Is it ok if your >> > page indexes are a day old? Do you need to paginate over 300 colums? >> > Or just 1? Do you need to always have the exact same number of entries >> > in each page? >> > >> > For my usecase I need to be able to jump to the page X and I don't >> > have any content. I have hundred of millions lines. Only the rowkey >> > matter for me and I'm fine if sometime I have 50 entries displayed, >> > and sometime only 45. So I'm thinking about calculating which row is >> > the first one for each page, and store that separatly. Then I just >> > need to run the MR daily. >> > >> > It's not a perfect solution I agree, but this might do the job for me. >> > I'm totally open to all other idea which might do the job to. >> > >> > JM >> > >> > 2013/1/29, anil gupta <[EMAIL PROTECTED]>: +
Jean-Marc Spaggiari 2013-01-30, 12:18
-
Re: Pagination with HBase - getting previous page of dataToby Lazar 2013-01-30, 12:42
Sounds like if you had 1000 regions, each with 99 rows, and you asked
for 100 that you'd get back 99,000. My guess is that a Filter is serialized once and that is sent successively to each region and that it isn't updated between regions. Don't think doing that would be too easy. Toby On 1/30/13, Jean-Marc Spaggiari <[EMAIL PROTECTED]> wrote: > Hi Anoop, > > So does it mean the scanner can send back LIMIT*2-1 lines max? Reading > 100 rows from the 2nd region is using extra time and resources. Why > not ask for only the number of missing lines? > > JM > > 2013/1/30, Anoop Sam John <[EMAIL PROTECTED]>: >> @Anil >> >>>I could not understand that why it goes to multiple regionservers in >> parallel. Why it cannot guarantee results <= page size( my guess: due to >> multiple RS scans)? If you have used it then maybe you can explain the >> behaviour? >> >> Scan from client side never go to multiple RS in parallel. Scan from >> HTable >> API will be sequential with one region after the other. For every region >> it >> will open up scanner in the RS and do next() calls. The filter will be >> instantiated at server side per region level ... >> >> When u need 100 rows in the page and you created a Scan at client side >> with >> the filter and suppose there are 2 regions, 1st the scanner is opened at >> for >> region1 and scan is happening. It will ensure that max 100 rows will be >> retrieved from that region. But when the region boundary crosses and >> client >> automatically open up scanner for the region2, there also it will pass >> filter with max 100 rows and so from there also max 100 rows can come.. >> So >> over all at the client side we can not guartee that the scan created will >> only scan 100 rows as a whole from the table. >> >> I think I am making it clear. I have not PageFilter at all.. I am just >> explaining as per the knowledge on scan flow and the general filter >> usage. >> >> "This is because the filter is applied separately on different region >> servers. It does however optimize the scan of individual HRegions by >> making >> sure that the page size is never exceeded locally. " >> >> I guess it need to be saying that "This is because the filter is >> applied >> separately on different regions". >> >> -Anoop- >> >> ________________________________________ >> From: anil gupta [[EMAIL PROTECTED]] >> Sent: Wednesday, January 30, 2013 1:33 PM >> To: [EMAIL PROTECTED] >> Subject: Re: Pagination with HBase - getting previous page of data >> >> Hi Mohammad, >> >> You are most welcome to join the discussion. I have never used PageFilter >> so i don't really have concrete input. >> I had a look at >> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html >> I could not understand that why it goes to multiple regionservers in >> parallel. Why it cannot guarantee results <= page size( my guess: due to >> multiple RS scans)? If you have used it then maybe you can explain the >> behaviour? >> >> Thanks, >> Anil >> >> On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <[EMAIL PROTECTED]> >> wrote: >> >>> I'm kinda hesitant to put my leg in between the pros ;)But, does it >>> sound >>> sane to use PageFilter for both rows and columns and having some >>> additional >>> logic to handle the 'nth' page logic?It'll help us in both kind of >>> paging. >>> >>> On Wednesday, January 30, 2013, Jean-Marc Spaggiari < >>> [EMAIL PROTECTED]> >>> wrote: >>> > Hi Anil, >>> > >>> > I think it really depend on the way you want to use the pagination. >>> > >>> > Do you need to be able to jump to page X? Are you ok if you miss a >>> > line or 2? Is your data growing fastly? Or slowly? Is it ok if your >>> > page indexes are a day old? Do you need to paginate over 300 colums? >>> > Or just 1? Do you need to always have the exact same number of entries >>> > in each page? >>> > >>> > For my usecase I need to be able to jump to the page X and I don't >>> > have any content. I have hundred of millions lines. Only the rowkey Sent from my mobile device +
Toby Lazar 2013-01-30, 12:42
-
Re: Pagination with HBase - getting previous page of dataAsaf Mesika 2013-02-03, 14:07
Here are my thoughts on this matter:
1. If you define setCaching(numOfRows) on the the scan object, you can check before each call to make sure you haven't passed your page limit, thus won't get to the point in which you retrieve from each region pageSize results. 2. I think its o.k. for the UI to present a certain point in time in the database on offer paging on that. You can achieve that by taking current timestamp (System.currentTime()) and force the results to returned up to that time by using scan.setTimeRange(0, currentTime). If you save currentTime and send it back with the results to the UI, it can keep sending it to backend, thus ensuring you're viewing that point in time. If rows keeps being inserted, their timestamp will be greater, thus not displayed On Wed, Jan 30, 2013 at 2:42 PM, Toby Lazar <[EMAIL PROTECTED]> wrote: > Sounds like if you had 1000 regions, each with 99 rows, and you asked > for 100 that you'd get back 99,000. My guess is that a Filter is > serialized once and that is sent successively to each region and that > it isn't updated between regions. Don't think doing that would be too > easy. > > Toby > > On 1/30/13, Jean-Marc Spaggiari <[EMAIL PROTECTED]> wrote: > > Hi Anoop, > > > > So does it mean the scanner can send back LIMIT*2-1 lines max? Reading > > 100 rows from the 2nd region is using extra time and resources. Why > > not ask for only the number of missing lines? > > > > JM > > > > 2013/1/30, Anoop Sam John <[EMAIL PROTECTED]>: > >> @Anil > >> > >>>I could not understand that why it goes to multiple regionservers in > >> parallel. Why it cannot guarantee results <= page size( my guess: due to > >> multiple RS scans)? If you have used it then maybe you can explain the > >> behaviour? > >> > >> Scan from client side never go to multiple RS in parallel. Scan from > >> HTable > >> API will be sequential with one region after the other. For every region > >> it > >> will open up scanner in the RS and do next() calls. The filter will be > >> instantiated at server side per region level ... > >> > >> When u need 100 rows in the page and you created a Scan at client side > >> with > >> the filter and suppose there are 2 regions, 1st the scanner is opened at > >> for > >> region1 and scan is happening. It will ensure that max 100 rows will be > >> retrieved from that region. But when the region boundary crosses and > >> client > >> automatically open up scanner for the region2, there also it will pass > >> filter with max 100 rows and so from there also max 100 rows can come.. > >> So > >> over all at the client side we can not guartee that the scan created > will > >> only scan 100 rows as a whole from the table. > >> > >> I think I am making it clear. I have not PageFilter at all.. I am just > >> explaining as per the knowledge on scan flow and the general filter > >> usage. > >> > >> "This is because the filter is applied separately on different region > >> servers. It does however optimize the scan of individual HRegions by > >> making > >> sure that the page size is never exceeded locally. " > >> > >> I guess it need to be saying that "This is because the filter is > >> applied > >> separately on different regions". > >> > >> -Anoop- > >> > >> ________________________________________ > >> From: anil gupta [[EMAIL PROTECTED]] > >> Sent: Wednesday, January 30, 2013 1:33 PM > >> To: [EMAIL PROTECTED] > >> Subject: Re: Pagination with HBase - getting previous page of data > >> > >> Hi Mohammad, > >> > >> You are most welcome to join the discussion. I have never used > PageFilter > >> so i don't really have concrete input. > >> I had a look at > >> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html > >> I could not understand that why it goes to multiple regionservers in > >> parallel. Why it cannot guarantee results <= page size( my guess: due to > >> multiple RS scans)? If you have used it then maybe you can explain the > >> behaviour? > >> > > +
Asaf Mesika 2013-02-03, 14:07
-
RE: Pagination with HBase - getting previous page of dataAnoop Sam John 2013-01-31, 03:23
JM,
>100 rows from the 2nd region is using extra time and resources. Why not ask for only the number of missing lines? These are some thing needs to be controlled by the scanning app. It can well control the pagination with out using the PageFilter I guess.. What do u say? -Anoop- ________________________________________ From: Jean-Marc Spaggiari [[EMAIL PROTECTED]] Sent: Wednesday, January 30, 2013 5:48 PM To: [EMAIL PROTECTED] Subject: Re: Pagination with HBase - getting previous page of data Hi Anoop, So does it mean the scanner can send back LIMIT*2-1 lines max? Reading 100 rows from the 2nd region is using extra time and resources. Why not ask for only the number of missing lines? JM 2013/1/30, Anoop Sam John <[EMAIL PROTECTED]>: > @Anil > >>I could not understand that why it goes to multiple regionservers in > parallel. Why it cannot guarantee results <= page size( my guess: due to > multiple RS scans)? If you have used it then maybe you can explain the > behaviour? > > Scan from client side never go to multiple RS in parallel. Scan from HTable > API will be sequential with one region after the other. For every region it > will open up scanner in the RS and do next() calls. The filter will be > instantiated at server side per region level ... > > When u need 100 rows in the page and you created a Scan at client side with > the filter and suppose there are 2 regions, 1st the scanner is opened at for > region1 and scan is happening. It will ensure that max 100 rows will be > retrieved from that region. But when the region boundary crosses and client > automatically open up scanner for the region2, there also it will pass > filter with max 100 rows and so from there also max 100 rows can come.. So > over all at the client side we can not guartee that the scan created will > only scan 100 rows as a whole from the table. > > I think I am making it clear. I have not PageFilter at all.. I am just > explaining as per the knowledge on scan flow and the general filter usage. > > "This is because the filter is applied separately on different region > servers. It does however optimize the scan of individual HRegions by making > sure that the page size is never exceeded locally. " > > I guess it need to be saying that "This is because the filter is applied > separately on different regions". > > -Anoop- > > ________________________________________ > From: anil gupta [[EMAIL PROTECTED]] > Sent: Wednesday, January 30, 2013 1:33 PM > To: [EMAIL PROTECTED] > Subject: Re: Pagination with HBase - getting previous page of data > > Hi Mohammad, > > You are most welcome to join the discussion. I have never used PageFilter > so i don't really have concrete input. > I had a look at > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html > I could not understand that why it goes to multiple regionservers in > parallel. Why it cannot guarantee results <= page size( my guess: due to > multiple RS scans)? If you have used it then maybe you can explain the > behaviour? > > Thanks, > Anil > > On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > >> I'm kinda hesitant to put my leg in between the pros ;)But, does it sound >> sane to use PageFilter for both rows and columns and having some >> additional >> logic to handle the 'nth' page logic?It'll help us in both kind of >> paging. >> >> On Wednesday, January 30, 2013, Jean-Marc Spaggiari < >> [EMAIL PROTECTED]> >> wrote: >> > Hi Anil, >> > >> > I think it really depend on the way you want to use the pagination. >> > >> > Do you need to be able to jump to page X? Are you ok if you miss a >> > line or 2? Is your data growing fastly? Or slowly? Is it ok if your >> > page indexes are a day old? Do you need to paginate over 300 colums? >> > Or just 1? Do you need to always have the exact same number of entries >> > in each page? >> > >> > For my usecase I need to be able to jump to the page X and I don't >> > have any content. I have hundred of millions lines. Only the rowkey +
Anoop Sam John 2013-01-31, 03:23
-
Re: Pagination with HBase - getting previous page of dataanil gupta 2013-02-02, 08:02
Hi Anoop,
Please find my reply inline. Thanks, Anil On Wed, Jan 30, 2013 at 3:31 AM, Anoop Sam John <[EMAIL PROTECTED]> wrote: > @Anil > > >I could not understand that why it goes to multiple regionservers in > parallel. Why it cannot guarantee results <= page size( my guess: due to > multiple RS scans)? If you have used it then maybe you can explain the > behaviour? > > Scan from client side never go to multiple RS in parallel. Scan from > HTable API will be sequential with one region after the other. For every > region it will open up scanner in the RS and do next() calls. The filter > will be instantiated at server side per region level ... > > When u need 100 rows in the page and you created a Scan at client side > with the filter and suppose there are 2 regions, 1st the scanner is opened > at for region1 and scan is happening. It will ensure that max 100 rows will > be retrieved from that region. But when the region boundary crosses and > client automatically open up scanner for the region2, there also it will > pass filter with max 100 rows and so from there also max 100 rows can > come.. So over all at the client side we can not guartee that the scan > created will only scan 100 rows as a whole from the table. > I agree with other people on this email chain that the 2nd region should only return (100 - no. of rows returned by Region1), if possible. When the region boundary crosses and client automatically open up scanner for the region2, why doesnt the scanner in Region2 knows that some of the rows are already fetched by region1. Do you mean to say that by default, for a scan spanning multiple regions, every region has it's own count of no.of rows that its going to return? i.e. lets say for a scan setCaching is 10 and scan is done across two regions. 9 Results(satisfying the filter) are in Region1 and 10 Results(satisfying the filter) are in Region2. Then will this scan return 19 (9+10) results? > > I think I am making it clear. I have not PageFilter at all.. I am just > explaining as per the knowledge on scan flow and the general filter usage. > > "This is because the filter is applied separately on different region > servers. It does however optimize the scan of individual HRegions by making > sure that the page size is never exceeded locally. " > > I guess it need to be saying that "This is because the filter is applied > separately on different regions". > > -Anoop- > > ________________________________________ > From: anil gupta [[EMAIL PROTECTED]] > Sent: Wednesday, January 30, 2013 1:33 PM > To: [EMAIL PROTECTED] > Subject: Re: Pagination with HBase - getting previous page of data > > Hi Mohammad, > > You are most welcome to join the discussion. I have never used PageFilter > so i don't really have concrete input. > I had a look at > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html > I could not understand that why it goes to multiple regionservers in > parallel. Why it cannot guarantee results <= page size( my guess: due to > multiple RS scans)? If you have used it then maybe you can explain the > behaviour? > > Thanks, > Anil > > On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <[EMAIL PROTECTED]> > wrote: > > > I'm kinda hesitant to put my leg in between the pros ;)But, does it sound > > sane to use PageFilter for both rows and columns and having some > additional > > logic to handle the 'nth' page logic?It'll help us in both kind of > paging. > > > > On Wednesday, January 30, 2013, Jean-Marc Spaggiari < > > [EMAIL PROTECTED]> > > wrote: > > > Hi Anil, > > > > > > I think it really depend on the way you want to use the pagination. > > > > > > Do you need to be able to jump to page X? Are you ok if you miss a > > > line or 2? Is your data growing fastly? Or slowly? Is it ok if your > > > page indexes are a day old? Do you need to paginate over 300 colums? > > > Or just 1? Do you need to always have the exact same number of entries > > > in each page? > Thanks & Regards, Anil Gupta +
anil gupta 2013-02-02, 08:02
-
Re: Pagination with HBase - getting previous page of dataAnoop John 2013-02-03, 16:07
>lets say for a scan setCaching is
10 and scan is done across two regions. 9 Results(satisfying the filter) are in Region1 and 10 Results(satisfying the filter) are in Region2. Then will this scan return 19 (9+10) results? @Anil. No it will return 10 results only not 19. The client here takes into account the no# of results got from previous region. But a filter is different. The filter has no logic to do at the client side. It fully executed at server side. This is the way it is designed. Personally I would prefer to do the pagination by app alone by using plain scan with caching (to avoid so many RPCs) and app level logic. -Anoop- On Sat, Feb 2, 2013 at 1:32 PM, anil gupta <[EMAIL PROTECTED]> wrote: > Hi Anoop, > > Please find my reply inline. > > Thanks, > Anil > > On Wed, Jan 30, 2013 at 3:31 AM, Anoop Sam John <[EMAIL PROTECTED]> > wrote: > > > @Anil > > > > >I could not understand that why it goes to multiple regionservers in > > parallel. Why it cannot guarantee results <= page size( my guess: due to > > multiple RS scans)? If you have used it then maybe you can explain the > > behaviour? > > > > Scan from client side never go to multiple RS in parallel. Scan from > > HTable API will be sequential with one region after the other. For every > > region it will open up scanner in the RS and do next() calls. The filter > > will be instantiated at server side per region level ... > > > > When u need 100 rows in the page and you created a Scan at client side > > with the filter and suppose there are 2 regions, 1st the scanner is > opened > > at for region1 and scan is happening. It will ensure that max 100 rows > will > > be retrieved from that region. But when the region boundary crosses and > > client automatically open up scanner for the region2, there also it will > > pass filter with max 100 rows and so from there also max 100 rows can > > come.. So over all at the client side we can not guartee that the scan > > created will only scan 100 rows as a whole from the table. > > > > I agree with other people on this email chain that the 2nd region should > only return (100 - no. of rows returned by Region1), if possible. > > When the region boundary crosses and client automatically open up scanner > for the region2, why doesnt the scanner in Region2 knows that some of the > rows are already fetched by region1. Do you mean to say that by default, > for a scan spanning multiple regions, every region has it's own count of > no.of rows that its going to return? i.e. lets say for a scan setCaching is > 10 and scan is done across two regions. 9 Results(satisfying the filter) > are in Region1 and 10 Results(satisfying the filter) are in Region2. Then > will this scan return 19 (9+10) results? > > > > > I think I am making it clear. I have not PageFilter at all.. I am just > > explaining as per the knowledge on scan flow and the general filter > usage. > > > > "This is because the filter is applied separately on different region > > servers. It does however optimize the scan of individual HRegions by > making > > sure that the page size is never exceeded locally. " > > > > I guess it need to be saying that "This is because the filter is > applied > > separately on different regions". > > > > -Anoop- > > > > ________________________________________ > > From: anil gupta [[EMAIL PROTECTED]] > > Sent: Wednesday, January 30, 2013 1:33 PM > > To: [EMAIL PROTECTED] > > Subject: Re: Pagination with HBase - getting previous page of data > > > > Hi Mohammad, > > > > You are most welcome to join the discussion. I have never used PageFilter > > so i don't really have concrete input. > > I had a look at > > > > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html > > I could not understand that why it goes to multiple regionservers in > > parallel. Why it cannot guarantee results <= page size( my guess: due to > > multiple RS scans)? If you have used it then maybe you can explain the > > behaviour? +
Anoop John 2013-02-03, 16:07
-
Re: Pagination with HBase - getting previous page of dataanil gupta 2013-02-03, 17:21
On Sun, Feb 3, 2013 at 8:07 AM, Anoop John <[EMAIL PROTECTED]> wrote:
> >lets say for a scan setCaching is > 10 and scan is done across two regions. 9 Results(satisfying the filter) > are in Region1 and 10 Results(satisfying the filter) are in Region2. Then > will this scan return 19 (9+10) results? > > @Anil. > No it will return 10 results only not 19. The client here takes into > account the no# of results got from previous region. But a filter is > different. The filter has no logic to do at the client side. It fully > executed at server side. This is the way it is designed. Personally I would > prefer to do the pagination by app alone by using plain scan with caching > (to avoid so many RPCs) and app level logic. > @Anoop: Nice, that's why even i try to stick simple Scans and maintain the logic of pagination in application. :) > > -Anoop- > > On Sat, Feb 2, 2013 at 1:32 PM, anil gupta <[EMAIL PROTECTED]> wrote: > > > Hi Anoop, > > > > Please find my reply inline. > > > > Thanks, > > Anil > > > > On Wed, Jan 30, 2013 at 3:31 AM, Anoop Sam John <[EMAIL PROTECTED]> > > wrote: > > > > > @Anil > > > > > > >I could not understand that why it goes to multiple regionservers in > > > parallel. Why it cannot guarantee results <= page size( my guess: due > to > > > multiple RS scans)? If you have used it then maybe you can explain the > > > behaviour? > > > > > > Scan from client side never go to multiple RS in parallel. Scan from > > > HTable API will be sequential with one region after the other. For > every > > > region it will open up scanner in the RS and do next() calls. The > filter > > > will be instantiated at server side per region level ... > > > > > > When u need 100 rows in the page and you created a Scan at client side > > > with the filter and suppose there are 2 regions, 1st the scanner is > > opened > > > at for region1 and scan is happening. It will ensure that max 100 rows > > will > > > be retrieved from that region. But when the region boundary crosses > and > > > client automatically open up scanner for the region2, there also it > will > > > pass filter with max 100 rows and so from there also max 100 rows can > > > come.. So over all at the client side we can not guartee that the scan > > > created will only scan 100 rows as a whole from the table. > > > > > > > I agree with other people on this email chain that the 2nd region should > > only return (100 - no. of rows returned by Region1), if possible. > > > > When the region boundary crosses and client automatically open up scanner > > for the region2, why doesnt the scanner in Region2 knows that some of the > > rows are already fetched by region1. Do you mean to say that by default, > > for a scan spanning multiple regions, every region has it's own count of > > no.of rows that its going to return? i.e. lets say for a scan setCaching > is > > 10 and scan is done across two regions. 9 Results(satisfying the filter) > > are in Region1 and 10 Results(satisfying the filter) are in Region2. Then > > will this scan return 19 (9+10) results? > > > > > > > > I think I am making it clear. I have not PageFilter at all.. I am > just > > > explaining as per the knowledge on scan flow and the general filter > > usage. > > > > > > "This is because the filter is applied separately on different region > > > servers. It does however optimize the scan of individual HRegions by > > making > > > sure that the page size is never exceeded locally. " > > > > > > I guess it need to be saying that "This is because the filter is > > applied > > > separately on different regions". > > > > > > -Anoop- > > > > > > ________________________________________ > > > From: anil gupta [[EMAIL PROTECTED]] > > > Sent: Wednesday, January 30, 2013 1:33 PM > > > To: [EMAIL PROTECTED] > > > Subject: Re: Pagination with HBase - getting previous page of data > > > > > > Hi Mohammad, > > > > > > You are most welcome to join the discussion. I have never used > PageFilter > > > so i don't really have concrete input. Thanks & Regards, Anil Gupta +
anil gupta 2013-02-03, 17:21
-
Re: Pagination with HBase - getting previous page of dataToby Lazar 2013-02-03, 17:25
Quick question - if you perform the pagination client-side and just
call scanner.iterator().next() to get to the necessary results, doesn't this add unecessary network traffic of the unused results? If you want results 100-120, does the client need to first read results 1-100 over the network? Couldn't a filter help prevent some of that unneeded traffic? Or, is the data only transferred when inspecting the result object? Thanks, Toby On Sun, Feb 3, 2013 at 11:07 AM, Anoop John <[EMAIL PROTECTED]> wrote: > >lets say for a scan setCaching is > 10 and scan is done across two regions. 9 Results(satisfying the filter) > are in Region1 and 10 Results(satisfying the filter) are in Region2. Then > will this scan return 19 (9+10) results? > > @Anil. > No it will return 10 results only not 19. The client here takes into > account the no# of results got from previous region. But a filter is > different. The filter has no logic to do at the client side. It fully > executed at server side. This is the way it is designed. Personally I would > prefer to do the pagination by app alone by using plain scan with caching > (to avoid so many RPCs) and app level logic. > > -Anoop- > > On Sat, Feb 2, 2013 at 1:32 PM, anil gupta <[EMAIL PROTECTED]> wrote: > > > Hi Anoop, > > > > Please find my reply inline. > > > > Thanks, > > Anil > > > > On Wed, Jan 30, 2013 at 3:31 AM, Anoop Sam John <[EMAIL PROTECTED]> > > wrote: > > > > > @Anil > > > > > > >I could not understand that why it goes to multiple regionservers in > > > parallel. Why it cannot guarantee results <= page size( my guess: due > to > > > multiple RS scans)? If you have used it then maybe you can explain the > > > behaviour? > > > > > > Scan from client side never go to multiple RS in parallel. Scan from > > > HTable API will be sequential with one region after the other. For > every > > > region it will open up scanner in the RS and do next() calls. The > filter > > > will be instantiated at server side per region level ... > > > > > > When u need 100 rows in the page and you created a Scan at client side > > > with the filter and suppose there are 2 regions, 1st the scanner is > > opened > > > at for region1 and scan is happening. It will ensure that max 100 rows > > will > > > be retrieved from that region. But when the region boundary crosses > and > > > client automatically open up scanner for the region2, there also it > will > > > pass filter with max 100 rows and so from there also max 100 rows can > > > come.. So over all at the client side we can not guartee that the scan > > > created will only scan 100 rows as a whole from the table. > > > > > > > I agree with other people on this email chain that the 2nd region should > > only return (100 - no. of rows returned by Region1), if possible. > > > > When the region boundary crosses and client automatically open up scanner > > for the region2, why doesnt the scanner in Region2 knows that some of the > > rows are already fetched by region1. Do you mean to say that by default, > > for a scan spanning multiple regions, every region has it's own count of > > no.of rows that its going to return? i.e. lets say for a scan setCaching > is > > 10 and scan is done across two regions. 9 Results(satisfying the filter) > > are in Region1 and 10 Results(satisfying the filter) are in Region2. Then > > will this scan return 19 (9+10) results? > > > > > > > > I think I am making it clear. I have not PageFilter at all.. I am > just > > > explaining as per the knowledge on scan flow and the general filter > > usage. > > > > > > "This is because the filter is applied separately on different region > > > servers. It does however optimize the scan of individual HRegions by > > making > > > sure that the page size is never exceeded locally. " > > > > > > I guess it need to be saying that "This is because the filter is > > applied > > > separately on different regions". > > > > > > -Anoop- > > > > > > ________________________________________ +
Toby Lazar 2013-02-03, 17:25
-
Re: Pagination with HBase - getting previous page of dataanil gupta 2013-02-03, 17:39
Inline...
On Sun, Feb 3, 2013 at 9:25 AM, Toby Lazar <[EMAIL PROTECTED]> wrote: > Quick question - if you perform the pagination client-side and just > call scanner.iterator().next() > to get to the necessary results, doesn't this add unecessary network > traffic of the unused results? Anil: It depends on the solution. If 95% your scans are limited to a single region then there wont be unnecessary Network I/O. > If you want results 100-120, does the > client need to first read results 1-100 over the network? Anil: If you do a simple scan and you want result 100-120 then i would say yes. Maybe you only get 100-120 by using pagination filter or writing some custom filter or coprocessor. As, i have mentioned earlier in this post that we wont be allowing the user to jump to100-120 directly. So, first the user needs to go through 1-100 results. Hence, i will know the rowkey of 100th results and "rowkey of 100th results" will become my startKey for 100-120 results. So, no unnecessary network I/O. > Couldn't a > filter help prevent some of that unneeded traffic? Or, is the data only > transferred when inspecting the result object? > Anil: Filters might help reduce unnecessary traffic. It all depends on your use case. > > Thanks, > > Toby > On Sun, Feb 3, 2013 at 11:07 AM, Anoop John <[EMAIL PROTECTED]> wrote: > > > >lets say for a scan setCaching is > > 10 and scan is done across two regions. 9 Results(satisfying the filter) > > are in Region1 and 10 Results(satisfying the filter) are in Region2. Then > > will this scan return 19 (9+10) results? > > > > @Anil. > > No it will return 10 results only not 19. The client here takes into > > account the no# of results got from previous region. But a filter is > > different. The filter has no logic to do at the client side. It fully > > executed at server side. This is the way it is designed. Personally I > would > > prefer to do the pagination by app alone by using plain scan with caching > > (to avoid so many RPCs) and app level logic. > > > > -Anoop- > > > > On Sat, Feb 2, 2013 at 1:32 PM, anil gupta <[EMAIL PROTECTED]> > wrote: > > > > > Hi Anoop, > > > > > > Please find my reply inline. > > > > > > Thanks, > > > Anil > > > > > > On Wed, Jan 30, 2013 at 3:31 AM, Anoop Sam John <[EMAIL PROTECTED]> > > > wrote: > > > > > > > @Anil > > > > > > > > >I could not understand that why it goes to multiple regionservers in > > > > parallel. Why it cannot guarantee results <= page size( my guess: due > > to > > > > multiple RS scans)? If you have used it then maybe you can explain > the > > > > behaviour? > > > > > > > > Scan from client side never go to multiple RS in parallel. Scan from > > > > HTable API will be sequential with one region after the other. For > > every > > > > region it will open up scanner in the RS and do next() calls. The > > filter > > > > will be instantiated at server side per region level ... > > > > > > > > When u need 100 rows in the page and you created a Scan at client > side > > > > with the filter and suppose there are 2 regions, 1st the scanner is > > > opened > > > > at for region1 and scan is happening. It will ensure that max 100 > rows > > > will > > > > be retrieved from that region. But when the region boundary crosses > > and > > > > client automatically open up scanner for the region2, there also it > > will > > > > pass filter with max 100 rows and so from there also max 100 rows can > > > > come.. So over all at the client side we can not guartee that the > scan > > > > created will only scan 100 rows as a whole from the table. > > > > > > > > > > I agree with other people on this email chain that the 2nd region > should > > > only return (100 - no. of rows returned by Region1), if possible. > > > > > > When the region boundary crosses and client automatically open up > scanner > > > for the region2, why doesnt the scanner in Region2 knows that some of > the > > > rows are already fetched by region1. Do you mean to say that by > default, Thanks & Regards, Anil Gupta +
anil gupta 2013-02-03, 17:39
|