|
|
-
Re: Fastest way to read only the keys of a HTable?Something Something 2011-02-03, 23:09
Awesome! It's instantaneous now. Thanks a bunch. Any such tricks for code
that looks like this... Get get = new Get(Bytes.toBytes(code)); Result result = table.get(get); NavigableMap<byte[], byte[]> map result.getFamilyMap(Bytes.toBytes("Keys")); if (map != null) { for (Map.Entry<byte[], byte[]> entry : map.entrySet()) { String key = Bytes.toString(entry.getValue()); Get get1 = new Get(Bytes.toBytes(key)); Result imp = table2.get(get1); // Do something with the result... } } Basically, I am reading the first table by a key (code). The "Keys" family contains keys of some other table, so I get each key from that family and retrieve row from the other table. Thanks again. On Thu, Feb 3, 2011 at 2:17 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote: > On the scan, you can setCaching with the number of rows you want to > pre-fetch per RPC. Setting it to 2 is already 2x better than the > default. > > J-D > > On Thu, Feb 3, 2011 at 1:35 PM, Something Something > <[EMAIL PROTECTED]> wrote: > > After adding the following line: > > > > scan.addFamily(Bytes.toBytes("Info")); > > > > performance improved dramatically (Thank you both!). But now I want it > to > > perform even faster, if possible -:) To read 43 rows, it's taking 2 > > seconds. Eventually, the 'partner' table may have over 500 entries. I > > guess, I will try by moving the recently added family to a different > table. > > Do you think that might help? > > > > Thanks again. > > > > > > On Thu, Feb 3, 2011 at 12:15 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > > > >> If you only need to consider a single column family, use > Scan.addFamily() > >> on your scanner. Then there will be no impact of the other column > families. > >> > >> > -----Original Message----- > >> > From: Something Something [mailto:[EMAIL PROTECTED]] > >> > Sent: Thursday, February 03, 2011 11:28 AM > >> > To: [EMAIL PROTECTED] > >> > Subject: Re: Fastest way to read only the keys of a HTable? > >> > > >> > Hmm.. performance hasn't improved at all. Do you see anything wrong > with > >> > the following code: > >> > > >> > > >> > public List<Partner> getPartners() { > >> > ArrayList<Partner> partners = new ArrayList<Partner>(); > >> > > >> > try { > >> > HTable table = new HTable("partner"); > >> > Scan scan = new Scan(); > >> > scan.setFilter(new FirstKeyOnlyFilter()); > >> > ResultScanner scanner = table.getScanner(scan); > >> > Result result = scanner.next(); > >> > while (result != null) { > >> > Partner partner = new > >> > Partner(Bytes.toString(result.getRow())); > >> > partners.add(partner); > >> > result = scanner.next(); > >> > } > >> > } catch (IOException e) { > >> > throw new RuntimeException(e); > >> > } > >> > return partners; > >> > } > >> > > >> > May be I shouldn't use more than one "column family" in a HTable - but > >> the > >> > BigTable paper recommends that, doesn't it? Please advice and thanks > for > >> > your help. > >> > > >> > > >> > > >> > > >> > On Wed, Feb 2, 2011 at 10:55 PM, Stack <[EMAIL PROTECTED]> wrote: > >> > > >> > > I don't see a getKey on Result. Use > >> > > > >> > > > >> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result > . > >> > > html#getRow() > >> > > . > >> > > > >> > > Here is how its used in the shell table.rb class: > >> > > > >> > > # Count rows in a table > >> > > def count(interval = 1000, caching_rows = 10) > >> > > # We can safely set scanner caching with the first key only > filter > >> > > scan = org.apache.hadoop.hbase.client.Scan.new > >> > > scan.cache_blocks = false > >> > > scan.caching = caching_rows > >> > > > >> > > > scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new) > >> > > > >> > > # Run the scanner |