Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Fastest way to read only the keys of a HTable?


Copy link to this message
-
Re: Fastest way to read only the keys of a HTable?
Awesome!  It's instantaneous now.  Thanks a bunch.  Any such tricks for code
that looks like this...

      Get get = new Get(Bytes.toBytes(code));
      Result result = table.get(get);
      NavigableMap<byte[], byte[]> map result.getFamilyMap(Bytes.toBytes("Keys"));
      if (map != null) {
        for (Map.Entry<byte[], byte[]> entry : map.entrySet()) {
          String key = Bytes.toString(entry.getValue());
          Get get1 = new Get(Bytes.toBytes(key));
          Result imp = table2.get(get1);
          // Do something with the result...
        }
      }

Basically, I am reading the first table by a key (code).  The "Keys" family
contains keys of some other table, so I get each key from that family and
retrieve row from the other table.

Thanks again.

On Thu, Feb 3, 2011 at 2:17 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]>wrote:

> On the scan, you can setCaching with the number of rows you want to
> pre-fetch per RPC. Setting it to 2 is already 2x better than the
> default.
>
> J-D
>
> On Thu, Feb 3, 2011 at 1:35 PM, Something Something
> <[EMAIL PROTECTED]> wrote:
> > After adding the following line:
> >
> > scan.addFamily(Bytes.toBytes("Info"));
> >
> > performance improved dramatically (Thank you both!).  But now I want it
> to
> > perform even faster, if possible -:)  To read 43 rows, it's taking 2
> > seconds.  Eventually, the 'partner' table may have over 500 entries.  I
> > guess, I will try by moving the recently added family to a different
> table.
> >  Do you think that might help?
> >
> > Thanks again.
> >
> >
> > On Thu, Feb 3, 2011 at 12:15 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote:
> >
> >> If you only need to consider a single column family, use
> Scan.addFamily()
> >> on your scanner.  Then there will be no impact of the other column
> families.
> >>
> >> > -----Original Message-----
> >> > From: Something Something [mailto:[EMAIL PROTECTED]]
> >> > Sent: Thursday, February 03, 2011 11:28 AM
> >> > To: [EMAIL PROTECTED]
> >> > Subject: Re: Fastest way to read only the keys of a HTable?
> >> >
> >> > Hmm.. performance hasn't improved at all.  Do you see anything wrong
> with
> >> > the following code:
> >> >
> >> >
> >> >     public List<Partner> getPartners() {
> >> >       ArrayList<Partner> partners = new ArrayList<Partner>();
> >> >
> >> >       try {
> >> >           HTable table = new HTable("partner");
> >> >           Scan scan = new Scan();
> >> >           scan.setFilter(new FirstKeyOnlyFilter());
> >> >           ResultScanner scanner = table.getScanner(scan);
> >> >           Result result = scanner.next();
> >> >           while (result != null) {
> >> >               Partner partner = new
> >> > Partner(Bytes.toString(result.getRow()));
> >> >               partners.add(partner);
> >> >               result = scanner.next();
> >> >           }
> >> >       } catch (IOException e) {
> >> >           throw new RuntimeException(e);
> >> >       }
> >> >       return partners;
> >> >   }
> >> >
> >> > May be I shouldn't use more than one "column family" in a HTable - but
> >> the
> >> > BigTable paper recommends that, doesn't it?  Please advice and thanks
> for
> >> > your help.
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Feb 2, 2011 at 10:55 PM, Stack <[EMAIL PROTECTED]> wrote:
> >> >
> >> > > I don't see a getKey on Result.  Use
> >> > >
> >> > >
> >> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result
> .
> >> > > html#getRow()
> >> > > .
> >> > >
> >> > > Here is how its used in the shell table.rb class:
> >> > >
> >> > >    # Count rows in a table
> >> > >    def count(interval = 1000, caching_rows = 10)
> >> > >      # We can safely set scanner caching with the first key only
> filter
> >> > >      scan = org.apache.hadoop.hbase.client.Scan.new
> >> > >      scan.cache_blocks = false
> >> > >      scan.caching = caching_rows
> >> > >
> >> > >
> scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)
> >> > >
> >> > >      # Run the scanner
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB