|
|
-
Re: Fastest way to read only the keys of a HTable?Something Something 2011-02-03, 21:35
After adding the following line:
scan.addFamily(Bytes.toBytes("Info")); performance improved dramatically (Thank you both!). But now I want it to perform even faster, if possible -:) To read 43 rows, it's taking 2 seconds. Eventually, the 'partner' table may have over 500 entries. I guess, I will try by moving the recently added family to a different table. Do you think that might help? Thanks again. On Thu, Feb 3, 2011 at 12:15 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > If you only need to consider a single column family, use Scan.addFamily() > on your scanner. Then there will be no impact of the other column families. > > > -----Original Message----- > > From: Something Something [mailto:[EMAIL PROTECTED]] > > Sent: Thursday, February 03, 2011 11:28 AM > > To: [EMAIL PROTECTED] > > Subject: Re: Fastest way to read only the keys of a HTable? > > > > Hmm.. performance hasn't improved at all. Do you see anything wrong with > > the following code: > > > > > > public List<Partner> getPartners() { > > ArrayList<Partner> partners = new ArrayList<Partner>(); > > > > try { > > HTable table = new HTable("partner"); > > Scan scan = new Scan(); > > scan.setFilter(new FirstKeyOnlyFilter()); > > ResultScanner scanner = table.getScanner(scan); > > Result result = scanner.next(); > > while (result != null) { > > Partner partner = new > > Partner(Bytes.toString(result.getRow())); > > partners.add(partner); > > result = scanner.next(); > > } > > } catch (IOException e) { > > throw new RuntimeException(e); > > } > > return partners; > > } > > > > May be I shouldn't use more than one "column family" in a HTable - but > the > > BigTable paper recommends that, doesn't it? Please advice and thanks for > > your help. > > > > > > > > > > On Wed, Feb 2, 2011 at 10:55 PM, Stack <[EMAIL PROTECTED]> wrote: > > > > > I don't see a getKey on Result. Use > > > > > > > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result. > > > html#getRow() > > > . > > > > > > Here is how its used in the shell table.rb class: > > > > > > # Count rows in a table > > > def count(interval = 1000, caching_rows = 10) > > > # We can safely set scanner caching with the first key only filter > > > scan = org.apache.hadoop.hbase.client.Scan.new > > > scan.cache_blocks = false > > > scan.caching = caching_rows > > > > > > scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new) > > > > > > # Run the scanner > > > scanner = @table.getScanner(scan) > > > count = 0 > > > iter = scanner.iterator > > > > > > # Iterate results > > > while iter.hasNext > > > row = iter.next > > > count += 1 > > > next unless (block_given? && count % interval == 0) > > > # Allow command modules to visualize counting process > > > yield(count, String.from_java_bytes(row.getRow)) > > > end > > > > > > # Return the counter > > > return count > > > end > > > > > > > > > St.Ack > > > > > > On Thu, Feb 3, 2011 at 6:47 AM, Something Something > > > <[EMAIL PROTECTED]> wrote: > > > > Thanks. So I will add this... > > > > > > > > scan.setFilter(new FirstKeyOnlyFilter()); > > > > > > > > But after I do this... > > > > > > > > Result result = scanner.next(); > > > > > > > > There's no... result.getKey() - so what method would give me the > > > > Key > > > value? > > > > > > > > > > > > > > > > On Wed, Feb 2, 2011 at 10:20 PM, Stack <[EMAIL PROTECTED]> wrote: > > > > > > > >> See > > > >> > > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKe > > > yOnlyFilter.html > > > >> St.Ack > > > >> > > > >> On Thu, Feb 3, 2011 at 6:01 AM, Something Something > > > >> <[EMAIL PROTECTED]> wrote: > > > >> > I want to read only the keys in a table. I tried this... > > > >> > |