Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Fastest way to read only the keys of a HTable?


Copy link to this message
-
Re: Fastest way to read only the keys of a HTable?
After adding the following line:

scan.addFamily(Bytes.toBytes("Info"));

performance improved dramatically (Thank you both!).  But now I want it to
perform even faster, if possible -:)  To read 43 rows, it's taking 2
seconds.  Eventually, the 'partner' table may have over 500 entries.  I
guess, I will try by moving the recently added family to a different table.
 Do you think that might help?

Thanks again.
On Thu, Feb 3, 2011 at 12:15 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote:

> If you only need to consider a single column family, use Scan.addFamily()
> on your scanner.  Then there will be no impact of the other column families.
>
> > -----Original Message-----
> > From: Something Something [mailto:[EMAIL PROTECTED]]
> > Sent: Thursday, February 03, 2011 11:28 AM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Fastest way to read only the keys of a HTable?
> >
> > Hmm.. performance hasn't improved at all.  Do you see anything wrong with
> > the following code:
> >
> >
> >     public List<Partner> getPartners() {
> >       ArrayList<Partner> partners = new ArrayList<Partner>();
> >
> >       try {
> >           HTable table = new HTable("partner");
> >           Scan scan = new Scan();
> >           scan.setFilter(new FirstKeyOnlyFilter());
> >           ResultScanner scanner = table.getScanner(scan);
> >           Result result = scanner.next();
> >           while (result != null) {
> >               Partner partner = new
> > Partner(Bytes.toString(result.getRow()));
> >               partners.add(partner);
> >               result = scanner.next();
> >           }
> >       } catch (IOException e) {
> >           throw new RuntimeException(e);
> >       }
> >       return partners;
> >   }
> >
> > May be I shouldn't use more than one "column family" in a HTable - but
> the
> > BigTable paper recommends that, doesn't it?  Please advice and thanks for
> > your help.
> >
> >
> >
> >
> > On Wed, Feb 2, 2011 at 10:55 PM, Stack <[EMAIL PROTECTED]> wrote:
> >
> > > I don't see a getKey on Result.  Use
> > >
> > >
> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.
> > > html#getRow()
> > > .
> > >
> > > Here is how its used in the shell table.rb class:
> > >
> > >    # Count rows in a table
> > >    def count(interval = 1000, caching_rows = 10)
> > >      # We can safely set scanner caching with the first key only filter
> > >      scan = org.apache.hadoop.hbase.client.Scan.new
> > >      scan.cache_blocks = false
> > >      scan.caching = caching_rows
> > >
> > > scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)
> > >
> > >      # Run the scanner
> > >      scanner = @table.getScanner(scan)
> > >      count = 0
> > >      iter = scanner.iterator
> > >
> > >      # Iterate results
> > >      while iter.hasNext
> > >        row = iter.next
> > >        count += 1
> > >        next unless (block_given? && count % interval == 0)
> > >        # Allow command modules to visualize counting process
> > >        yield(count, String.from_java_bytes(row.getRow))
> > >      end
> > >
> > >      # Return the counter
> > >      return count
> > >    end
> > >
> > >
> > > St.Ack
> > >
> > > On Thu, Feb 3, 2011 at 6:47 AM, Something Something
> > > <[EMAIL PROTECTED]> wrote:
> > > > Thanks.  So I will add this...
> > > >
> > > >   scan.setFilter(new FirstKeyOnlyFilter());
> > > >
> > > > But after I do this...
> > > >
> > > >   Result result = scanner.next();
> > > >
> > > > There's no...  result.getKey() - so what method would give me the
> > > > Key
> > > value?
> > > >
> > > >
> > > >
> > > > On Wed, Feb 2, 2011 at 10:20 PM, Stack <[EMAIL PROTECTED]> wrote:
> > > >
> > > >> See
> > > >>
> > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKe
> > > yOnlyFilter.html
> > > >> St.Ack
> > > >>
> > > >> On Thu, Feb 3, 2011 at 6:01 AM, Something Something
> > > >> <[EMAIL PROTECTED]> wrote:
> > > >> > I want to read only the keys in a table. I tried this...
> > > >> >