|
|
Weishung Chung 2011-03-01, 22:03
How to get the first or last row in the HBase table? like the min(), max() in mysql? Thank you.
+
Weishung Chung 2011-03-01, 22:03
For min, you can write your own filter which extends FilterBase so that the scan stops after seeing the first row.
On Tue, Mar 1, 2011 at 2:03 PM, Weishung Chung <[EMAIL PROTECTED]> wrote:
> How to get the first or last row in the HBase table? like the min(), max() > in mysql? > Thank you. >
+
Ted Yu 2011-03-01, 22:44
On Tue, Mar 1, 2011 at 2:44 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > For min, you can write your own filter which extends FilterBase so that the > scan stops after seeing the first row. >
Or just start the scan at the a row whose name is the empty byte array and kill the scan after the first return; the first return will be the first row in table.
I don't know how to get the last row in a table, easily.
St.Ack
+
Stack 2011-03-02, 04:51
Weishung: For max, you can enumerate the regions for your table. Start the scan from the first row in the last region.
On Tue, Mar 1, 2011 at 8:51 PM, Stack <[EMAIL PROTECTED]> wrote:
> On Tue, Mar 1, 2011 at 2:44 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > For min, you can write your own filter which extends FilterBase so that > the > > scan stops after seeing the first row. > > > > Or just start the scan at the a row whose name is the empty byte array > and kill the scan after the first return; the first return will be the > first row in table. > > I don't know how to get the last row in a table, easily. > > St.Ack >
+
Ted Yu 2011-03-02, 04:58
Oh, forgot, there is http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRowOrBefore(byte[], byte[]) See what happens if you pass it the empty byte array; i.e. the last row in a table (last logical row is an empty byte array as is first logical row) St.Ack On Tue, Mar 1, 2011 at 8:58 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > Weishung: > For max, you can enumerate the regions for your table. Start the scan from > the first row in the last region. > > On Tue, Mar 1, 2011 at 8:51 PM, Stack <[EMAIL PROTECTED]> wrote: > >> On Tue, Mar 1, 2011 at 2:44 PM, Ted Yu <[EMAIL PROTECTED]> wrote: >> > For min, you can write your own filter which extends FilterBase so that >> the >> > scan stops after seeing the first row. >> > >> >> Or just start the scan at the a row whose name is the empty byte array >> and kill the scan after the first return; the first return will be the >> first row in table. >> >> I don't know how to get the last row in a table, easily. >> >> St.Ack >> >
+
Stack 2011-03-02, 05:08
I wonder if that would work. Looking at ServerCallable: public void instantiateServer(boolean reload) throws IOException { this.location = connection.getRegionLocation(tableName, row, reload); It seems empty byte array would correspond to the first region, instead of the last. On Tue, Mar 1, 2011 at 9:08 PM, Stack <[EMAIL PROTECTED]> wrote: > Oh, forgot, there is > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRowOrBefore(byte[]> , > byte[]) See what happens if you pass it the empty byte array; i.e. > the last row in a table (last logical row is an empty byte array as is > first logical row) > > St.Ack > > On Tue, Mar 1, 2011 at 8:58 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > Weishung: > > For max, you can enumerate the regions for your table. Start the scan > from > > the first row in the last region. > > > > On Tue, Mar 1, 2011 at 8:51 PM, Stack <[EMAIL PROTECTED]> wrote: > > > >> On Tue, Mar 1, 2011 at 2:44 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > >> > For min, you can write your own filter which extends FilterBase so > that > >> the > >> > scan stops after seeing the first row. > >> > > >> > >> Or just start the scan at the a row whose name is the empty byte array > >> and kill the scan after the first return; the first return will be the > >> first row in table. > >> > >> I don't know how to get the last row in a table, easily. > >> > >> St.Ack > >> > > >
+
Ted Yu 2011-03-02, 05:35
Yes I do it like this. But I hava another problem I can't count the rows of one table fast.
On Wed, Mar 2, 2011 at 12:58 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
> Weishung: > For max, you can enumerate the regions for your table. Start the scan from > the first row in the last region. > > On Tue, Mar 1, 2011 at 8:51 PM, Stack <[EMAIL PROTECTED]> wrote: > > > On Tue, Mar 1, 2011 at 2:44 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > For min, you can write your own filter which extends FilterBase so that > > the > > > scan stops after seeing the first row. > > > > > > > Or just start the scan at the a row whose name is the empty byte array > > and kill the scan after the first return; the first return will be the > > first row in table. > > > > I don't know how to get the last row in a table, easily. > > > > St.Ack > > >
-- Thanks & Best regards jiajun
Weishung Chung 2011-03-02, 14:53
Awesome, thanks a lot. I will try them out n let u guys know the result. On Tue, Mar 1, 2011 at 11:54 PM, 陈加俊 <[EMAIL PROTECTED]> wrote:
> Yes I do it like this. But I hava another problem I can't count the rows of > one table fast. > > On Wed, Mar 2, 2011 at 12:58 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > Weishung: > > For max, you can enumerate the regions for your table. Start the scan > from > > the first row in the last region. > > > > On Tue, Mar 1, 2011 at 8:51 PM, Stack <[EMAIL PROTECTED]> wrote: > > > > > On Tue, Mar 1, 2011 at 2:44 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > For min, you can write your own filter which extends FilterBase so > that > > > the > > > > scan stops after seeing the first row. > > > > > > > > > > Or just start the scan at the a row whose name is the empty byte array > > > and kill the scan after the first return; the first return will be the > > > first row in table. > > > > > > I don't know how to get the last row in a table, easily. > > > > > > St.Ack > > > > > > > > > -- > Thanks & Best regards > jiajun >
+
Weishung Chung 2011-03-02, 14:53
Weishung Chung 2011-03-03, 04:30
I tried the method as Stack suggested to find the first row, it works :) I have yet to learn about Filter and would like to use it too. I was wondering which method would give a better performance. As for the max, I will try it out tomorrow.
I thought I could use the getEndKeys() method but it doesn't work as I expected. It returns empty byte[]
public byte[][] getEndKeys() throws IOException {
return getStartEndKeys().getSecond();
}
On Tue, Mar 1, 2011 at 10:51 PM, Stack <[EMAIL PROTECTED]> wrote:
> On Tue, Mar 1, 2011 at 2:44 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > For min, you can write your own filter which extends FilterBase so that > the > > scan stops after seeing the first row. > > > > Or just start the scan at the a row whose name is the empty byte array > and kill the scan after the first return; the first return will be the > first row in table. > > I don't know how to get the last row in a table, easily. > > St.Ack >
+
Weishung Chung 2011-03-03, 04:30
Weishung Chung 2011-03-03, 15:18
Thanks, Stack!
Got a few more questions.
Does every region start with an empty byte[] and end with one too? Also, if i get all the region infos using Map<HRegionInfo, HServerAddress> map = table.getRegionsInfo(); Would these region infos be sorted according to the keys? If so, I would just get the last region info from the last element in the map. (trying to get last row)
Thank you,
On Wed, Mar 2, 2011 at 10:39 PM, Stack <[EMAIL PROTECTED]> wrote:
> On Wed, Mar 2, 2011 at 8:30 PM, Weishung Chung <[EMAIL PROTECTED]> wrote: > > I tried the method as Stack suggested to find the first row, it works :) > I > > have yet to learn about Filter and would like to use it too. I was > wondering > > which method would give a better performance. > > The non-filter version I'd say (smile). > > > > As for the max, I will try it out tomorrow. > > I thought I could use the getEndKeys() method but it doesn't work as I > > expected. It returns empty byte[] > > public byte[][] getEndKeys() throws IOException { > > > > return getStartEndKeys().getSecond(); > > > > } > > Yeah, this is the 'endkey' on the last region. You want the one just > before that I take it. > > St.Ack >
+
Weishung Chung 2011-03-03, 15:18
Bill Graham 2011-03-03, 16:50
This first region starts with an empty byte[] and the last region ends with one. Those in between have non-empy byte[]s to specify their boundaries.
On Thu, Mar 3, 2011 at 7:18 AM, Weishung Chung <[EMAIL PROTECTED]> wrote: > Thanks, Stack! > > Got a few more questions. > > Does every region start with an empty byte[] and end with one too? Also, if > i get all the region infos using > Map<HRegionInfo, HServerAddress> map = table.getRegionsInfo(); > Would these region infos be sorted according to the keys? If so, I would > just get the last region info from the last element in the map. (trying to > get last row) > > Thank you, > > On Wed, Mar 2, 2011 at 10:39 PM, Stack <[EMAIL PROTECTED]> wrote: > >> On Wed, Mar 2, 2011 at 8:30 PM, Weishung Chung <[EMAIL PROTECTED]> wrote: >> > I tried the method as Stack suggested to find the first row, it works :) >> I >> > have yet to learn about Filter and would like to use it too. I was >> wondering >> > which method would give a better performance. >> >> The non-filter version I'd say (smile). >> >> >> > As for the max, I will try it out tomorrow. >> > I thought I could use the getEndKeys() method but it doesn't work as I >> > expected. It returns empty byte[] >> > public byte[][] getEndKeys() throws IOException { >> > >> > return getStartEndKeys().getSecond(); >> > >> > } >> >> Yeah, this is the 'endkey' on the last region. You want the one just >> before that I take it. >> >> St.Ack >> >
+
Bill Graham 2011-03-03, 16:50
>> Would these region infos be sorted according to the keys? Yes.
>> If so, I would just get the last region info from the last element in the map. (trying to get last row) If your table is created with multiple regions, the last region may not contain any row. You can iterate the map backwards.
On Thu, Mar 3, 2011 at 8:50 AM, Bill Graham <[EMAIL PROTECTED]> wrote:
> This first region starts with an empty byte[] and the last region ends > with one. Those in between have non-empy byte[]s to specify their > boundaries. > > On Thu, Mar 3, 2011 at 7:18 AM, Weishung Chung <[EMAIL PROTECTED]> wrote: > > Thanks, Stack! > > > > Got a few more questions. > > > > Does every region start with an empty byte[] and end with one too? Also, > if > > i get all the region infos using > > Map<HRegionInfo, HServerAddress> map = table.getRegionsInfo(); > > Would these region infos be sorted according to the keys? If so, I would > > just get the last region info from the last element in the map. (trying > to > > get last row) > > > > Thank you, > > > > On Wed, Mar 2, 2011 at 10:39 PM, Stack <[EMAIL PROTECTED]> wrote: > > > >> On Wed, Mar 2, 2011 at 8:30 PM, Weishung Chung <[EMAIL PROTECTED]> > wrote: > >> > I tried the method as Stack suggested to find the first row, it works > :) > >> I > >> > have yet to learn about Filter and would like to use it too. I was > >> wondering > >> > which method would give a better performance. > >> > >> The non-filter version I'd say (smile). > >> > >> > >> > As for the max, I will try it out tomorrow. > >> > I thought I could use the getEndKeys() method but it doesn't work as I > >> > expected. It returns empty byte[] > >> > public byte[][] getEndKeys() throws IOException { > >> > > >> > return getStartEndKeys().getSecond(); > >> > > >> > } > >> > >> Yeah, this is the 'endkey' on the last region. You want the one just > >> before that I take it. > >> > >> St.Ack > >> > > >
+
Ted Yu 2011-03-03, 17:22
Weishung Chung 2011-03-03, 18:48
Bill, thank you for the clarification. Ted, good info, i will iterate the map backwards then :)
Another question I have is about unit testing in HBase, any recommendation about the best way to simulate the cluster, I read about the built in mini cluster. Also, how to change the region size to simulate multiple regions so that i can test the getting last row easily.
Thank u guys :)
On Thu, Mar 3, 2011 at 11:22 AM, Ted Yu <[EMAIL PROTECTED]> wrote:
> >> Would these region infos be sorted according to the keys? > Yes. > > >> If so, I would just get the last region info from the last element in > the > map. (trying to get last row) > If your table is created with multiple regions, the last region may not > contain any row. You can iterate the map backwards. > > On Thu, Mar 3, 2011 at 8:50 AM, Bill Graham <[EMAIL PROTECTED]> wrote: > > > This first region starts with an empty byte[] and the last region ends > > with one. Those in between have non-empy byte[]s to specify their > > boundaries. > > > > On Thu, Mar 3, 2011 at 7:18 AM, Weishung Chung <[EMAIL PROTECTED]> > wrote: > > > Thanks, Stack! > > > > > > Got a few more questions. > > > > > > Does every region start with an empty byte[] and end with one too? > Also, > > if > > > i get all the region infos using > > > Map<HRegionInfo, HServerAddress> map = table.getRegionsInfo(); > > > Would these region infos be sorted according to the keys? If so, I > would > > > just get the last region info from the last element in the map. (trying > > to > > > get last row) > > > > > > Thank you, > > > > > > On Wed, Mar 2, 2011 at 10:39 PM, Stack <[EMAIL PROTECTED]> wrote: > > > > > >> On Wed, Mar 2, 2011 at 8:30 PM, Weishung Chung <[EMAIL PROTECTED]> > > wrote: > > >> > I tried the method as Stack suggested to find the first row, it > works > > :) > > >> I > > >> > have yet to learn about Filter and would like to use it too. I was > > >> wondering > > >> > which method would give a better performance. > > >> > > >> The non-filter version I'd say (smile). > > >> > > >> > > >> > As for the max, I will try it out tomorrow. > > >> > I thought I could use the getEndKeys() method but it doesn't work as > I > > >> > expected. It returns empty byte[] > > >> > public byte[][] getEndKeys() throws IOException { > > >> > > > >> > return getStartEndKeys().getSecond(); > > >> > > > >> > } > > >> > > >> Yeah, this is the 'endkey' on the last region. You want the one just > > >> before that I take it. > > >> > > >> St.Ack > > >> > > > > > >
+
Weishung Chung 2011-03-03, 18:48
src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java is used in many tests.
You can call the following HBaseAdmin method to create table with multiple regions: public void createTable(HTableDescriptor desc, byte [] startKey, byte [] endKey, int numRegions) On Thu, Mar 3, 2011 at 10:48 AM, Weishung Chung <[EMAIL PROTECTED]> wrote:
> Bill, thank you for the clarification. > Ted, good info, i will iterate the map backwards then :) > > Another question I have is about unit testing in HBase, any recommendation > about the best way to simulate the cluster, I read about the built in mini > cluster. Also, how to change the region size to simulate multiple regions so > that i can test the getting last row easily. > > Thank u guys :) > > > On Thu, Mar 3, 2011 at 11:22 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > >> >> Would these region infos be sorted according to the keys? >> Yes. >> >> >> If so, I would just get the last region info from the last element in >> the >> map. (trying to get last row) >> If your table is created with multiple regions, the last region may not >> contain any row. You can iterate the map backwards. >> >> On Thu, Mar 3, 2011 at 8:50 AM, Bill Graham <[EMAIL PROTECTED]> wrote: >> >> > This first region starts with an empty byte[] and the last region ends >> > with one. Those in between have non-empy byte[]s to specify their >> > boundaries. >> > >> > On Thu, Mar 3, 2011 at 7:18 AM, Weishung Chung <[EMAIL PROTECTED]> >> wrote: >> > > Thanks, Stack! >> > > >> > > Got a few more questions. >> > > >> > > Does every region start with an empty byte[] and end with one too? >> Also, >> > if >> > > i get all the region infos using >> > > Map<HRegionInfo, HServerAddress> map = table.getRegionsInfo(); >> > > Would these region infos be sorted according to the keys? If so, I >> would >> > > just get the last region info from the last element in the map. >> (trying >> > to >> > > get last row) >> > > >> > > Thank you, >> > > >> > > On Wed, Mar 2, 2011 at 10:39 PM, Stack <[EMAIL PROTECTED]> wrote: >> > > >> > >> On Wed, Mar 2, 2011 at 8:30 PM, Weishung Chung <[EMAIL PROTECTED]> >> > wrote: >> > >> > I tried the method as Stack suggested to find the first row, it >> works >> > :) >> > >> I >> > >> > have yet to learn about Filter and would like to use it too. I was >> > >> wondering >> > >> > which method would give a better performance. >> > >> >> > >> The non-filter version I'd say (smile). >> > >> >> > >> >> > >> > As for the max, I will try it out tomorrow. >> > >> > I thought I could use the getEndKeys() method but it doesn't work >> as I >> > >> > expected. It returns empty byte[] >> > >> > public byte[][] getEndKeys() throws IOException { >> > >> > >> > >> > return getStartEndKeys().getSecond(); >> > >> > >> > >> > } >> > >> >> > >> Yeah, this is the 'endkey' on the last region. You want the one just >> > >> before that I take it. >> > >> >> > >> St.Ack >> > >> >> > > >> > >> > >
+
Ted Yu 2011-03-03, 18:59
And there is http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/St.Ack On Thu, Mar 3, 2011 at 10:48 AM, Weishung Chung <[EMAIL PROTECTED]> wrote: > Bill, thank you for the clarification. > Ted, good info, i will iterate the map backwards then :) > > Another question I have is about unit testing in HBase, any recommendation > about the best way to simulate the cluster, I read about the built in mini > cluster. Also, how to change the region size to simulate multiple regions so > that i can test the getting last row easily. > > Thank u guys :) > > On Thu, Mar 3, 2011 at 11:22 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > >> >> Would these region infos be sorted according to the keys? >> Yes. >> >> >> If so, I would just get the last region info from the last element in >> the >> map. (trying to get last row) >> If your table is created with multiple regions, the last region may not >> contain any row. You can iterate the map backwards. >> >> On Thu, Mar 3, 2011 at 8:50 AM, Bill Graham <[EMAIL PROTECTED]> wrote: >> >> > This first region starts with an empty byte[] and the last region ends >> > with one. Those in between have non-empy byte[]s to specify their >> > boundaries. >> > >> > On Thu, Mar 3, 2011 at 7:18 AM, Weishung Chung <[EMAIL PROTECTED]> >> wrote: >> > > Thanks, Stack! >> > > >> > > Got a few more questions. >> > > >> > > Does every region start with an empty byte[] and end with one too? >> Also, >> > if >> > > i get all the region infos using >> > > Map<HRegionInfo, HServerAddress> map = table.getRegionsInfo(); >> > > Would these region infos be sorted according to the keys? If so, I >> would >> > > just get the last region info from the last element in the map. (trying >> > to >> > > get last row) >> > > >> > > Thank you, >> > > >> > > On Wed, Mar 2, 2011 at 10:39 PM, Stack <[EMAIL PROTECTED]> wrote: >> > > >> > >> On Wed, Mar 2, 2011 at 8:30 PM, Weishung Chung <[EMAIL PROTECTED]> >> > wrote: >> > >> > I tried the method as Stack suggested to find the first row, it >> works >> > :) >> > >> I >> > >> > have yet to learn about Filter and would like to use it too. I was >> > >> wondering >> > >> > which method would give a better performance. >> > >> >> > >> The non-filter version I'd say (smile). >> > >> >> > >> >> > >> > As for the max, I will try it out tomorrow. >> > >> > I thought I could use the getEndKeys() method but it doesn't work as >> I >> > >> > expected. It returns empty byte[] >> > >> > public byte[][] getEndKeys() throws IOException { >> > >> > >> > >> > return getStartEndKeys().getSecond(); >> > >> > >> > >> > } >> > >> >> > >> Yeah, this is the 'endkey' on the last region. You want the one just >> > >> before that I take it. >> > >> >> > >> St.Ack >> > >> >> > > >> > >> >
+
Stack 2011-03-03, 19:35
Weishung Chung 2011-03-04, 14:55
Thanks a lot guys...I'm learning a lot :) +10 for the active and great community support behind HBase !!! On Thu, Mar 3, 2011 at 1:35 PM, Stack <[EMAIL PROTECTED]> wrote: > And there is > http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/> St.Ack > > On Thu, Mar 3, 2011 at 10:48 AM, Weishung Chung <[EMAIL PROTECTED]> > wrote: > > Bill, thank you for the clarification. > > Ted, good info, i will iterate the map backwards then :) > > > > Another question I have is about unit testing in HBase, any > recommendation > > about the best way to simulate the cluster, I read about the built in > mini > > cluster. Also, how to change the region size to simulate multiple regions > so > > that i can test the getting last row easily. > > > > Thank u guys :) > > > > On Thu, Mar 3, 2011 at 11:22 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > >> >> Would these region infos be sorted according to the keys? > >> Yes. > >> > >> >> If so, I would just get the last region info from the last element in > >> the > >> map. (trying to get last row) > >> If your table is created with multiple regions, the last region may not > >> contain any row. You can iterate the map backwards. > >> > >> On Thu, Mar 3, 2011 at 8:50 AM, Bill Graham <[EMAIL PROTECTED]> > wrote: > >> > >> > This first region starts with an empty byte[] and the last region ends > >> > with one. Those in between have non-empy byte[]s to specify their > >> > boundaries. > >> > > >> > On Thu, Mar 3, 2011 at 7:18 AM, Weishung Chung <[EMAIL PROTECTED]> > >> wrote: > >> > > Thanks, Stack! > >> > > > >> > > Got a few more questions. > >> > > > >> > > Does every region start with an empty byte[] and end with one too? > >> Also, > >> > if > >> > > i get all the region infos using > >> > > Map<HRegionInfo, HServerAddress> map = table.getRegionsInfo(); > >> > > Would these region infos be sorted according to the keys? If so, I > >> would > >> > > just get the last region info from the last element in the map. > (trying > >> > to > >> > > get last row) > >> > > > >> > > Thank you, > >> > > > >> > > On Wed, Mar 2, 2011 at 10:39 PM, Stack <[EMAIL PROTECTED]> wrote: > >> > > > >> > >> On Wed, Mar 2, 2011 at 8:30 PM, Weishung Chung <[EMAIL PROTECTED] > > > >> > wrote: > >> > >> > I tried the method as Stack suggested to find the first row, it > >> works > >> > :) > >> > >> I > >> > >> > have yet to learn about Filter and would like to use it too. I > was > >> > >> wondering > >> > >> > which method would give a better performance. > >> > >> > >> > >> The non-filter version I'd say (smile). > >> > >> > >> > >> > >> > >> > As for the max, I will try it out tomorrow. > >> > >> > I thought I could use the getEndKeys() method but it doesn't work > as > >> I > >> > >> > expected. It returns empty byte[] > >> > >> > public byte[][] getEndKeys() throws IOException { > >> > >> > > >> > >> > return getStartEndKeys().getSecond(); > >> > >> > > >> > >> > } > >> > >> > >> > >> Yeah, this is the 'endkey' on the last region. You want the one > just > >> > >> before that I take it. > >> > >> > >> > >> St.Ack > >> > >> > >> > > > >> > > >> > > >
+
Weishung Chung 2011-03-04, 14:55
|
|