Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Fastest way to read only the keys of a HTable?


Copy link to this message
-
Re: Fastest way to read only the keys of a HTable?
Hmm.. performance hasn't improved at all.  Do you see anything wrong with
the following code:
    public List<Partner> getPartners() {
      ArrayList<Partner> partners = new ArrayList<Partner>();

      try {
          HTable table = new HTable("partner");
          Scan scan = new Scan();
          scan.setFilter(new FirstKeyOnlyFilter());
          ResultScanner scanner = table.getScanner(scan);
          Result result = scanner.next();
          while (result != null) {
              Partner partner = new
Partner(Bytes.toString(result.getRow()));
              partners.add(partner);
              result = scanner.next();
          }
      } catch (IOException e) {
          throw new RuntimeException(e);
      }
      return partners;
  }

May be I shouldn't use more than one "column family" in a HTable - but the
BigTable paper recommends that, doesn't it?  Please advice and thanks for
your help.
On Wed, Feb 2, 2011 at 10:55 PM, Stack <[EMAIL PROTECTED]> wrote:

> I don't see a getKey on Result.  Use
>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#getRow()
> .
>
> Here is how its used in the shell table.rb class:
>
>    # Count rows in a table
>    def count(interval = 1000, caching_rows = 10)
>      # We can safely set scanner caching with the first key only filter
>      scan = org.apache.hadoop.hbase.client.Scan.new
>      scan.cache_blocks = false
>      scan.caching = caching_rows
>      scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)
>
>      # Run the scanner
>      scanner = @table.getScanner(scan)
>      count = 0
>      iter = scanner.iterator
>
>      # Iterate results
>      while iter.hasNext
>        row = iter.next
>        count += 1
>        next unless (block_given? && count % interval == 0)
>        # Allow command modules to visualize counting process
>        yield(count, String.from_java_bytes(row.getRow))
>      end
>
>      # Return the counter
>      return count
>    end
>
>
> St.Ack
>
> On Thu, Feb 3, 2011 at 6:47 AM, Something Something
> <[EMAIL PROTECTED]> wrote:
> > Thanks.  So I will add this...
> >
> >   scan.setFilter(new FirstKeyOnlyFilter());
> >
> > But after I do this...
> >
> >   Result result = scanner.next();
> >
> > There's no...  result.getKey() - so what method would give me the Key
> value?
> >
> >
> >
> > On Wed, Feb 2, 2011 at 10:20 PM, Stack <[EMAIL PROTECTED]> wrote:
> >
> >> See
> >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html
> >> St.Ack
> >>
> >> On Thu, Feb 3, 2011 at 6:01 AM, Something Something
> >> <[EMAIL PROTECTED]> wrote:
> >> > I want to read only the keys in a table. I tried this...
> >> >
> >> >    try {
> >> >
> >> >  HTable table = new HTable("myTable");
> >> >
> >> >  Scan scan = new Scan();
> >> >
> >> >  scan.addFamily(Bytes.toBytes("Info"));
> >> >
> >> >  ResultScanner scanner = table.getScanner(scan);
> >> >
> >> >   Result result = scanner.next();
> >> >
> >> >  while (result != null) {
> >> >
> >> > & so on...
> >> >
> >> > This was performing fairly well until I added another Family that
> >> contains
> >> > lots of key/value pairs.  My understanding was that adding another
> family
> >> > wouldn't affect performance of this code because I am explicitly using
> >> > "Info", but it is.
> >> >
> >> > Anyway, in this particular use case, I only care about the "Key" of
> the
> >> row.
> >> >  I don't need any values from any of the families.  What's the best
> way
> >> to
> >> > do this?
> >> >
> >> > Please let me know.  Thanks.
> >> >
> >>
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB