Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Fastest way to read only the keys of a HTable?


Copy link to this message
-
Re: Fastest way to read only the keys of a HTable?
Hmm.. performance hasn't improved at all.  Do you see anything wrong with
the following code:
    public List<Partner> getPartners() {
      ArrayList<Partner> partners = new ArrayList<Partner>();

      try {
          HTable table = new HTable("partner");
          Scan scan = new Scan();
          scan.setFilter(new FirstKeyOnlyFilter());
          ResultScanner scanner = table.getScanner(scan);
          Result result = scanner.next();
          while (result != null) {
              Partner partner = new
Partner(Bytes.toString(result.getRow()));
              partners.add(partner);
              result = scanner.next();
          }
      } catch (IOException e) {
          throw new RuntimeException(e);
      }
      return partners;
  }

May be I shouldn't use more than one "column family" in a HTable - but the
BigTable paper recommends that, doesn't it?  Please advice and thanks for
your help.
On Wed, Feb 2, 2011 at 10:55 PM, Stack <[EMAIL PROTECTED]> wrote:

> I don't see a getKey on Result.  Use
>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#getRow()
> .
>
> Here is how its used in the shell table.rb class:
>
>    # Count rows in a table
>    def count(interval = 1000, caching_rows = 10)
>      # We can safely set scanner caching with the first key only filter
>      scan = org.apache.hadoop.hbase.client.Scan.new
>      scan.cache_blocks = false
>      scan.caching = caching_rows
>      scan.setFilter(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new)
>
>      # Run the scanner
>      scanner = @table.getScanner(scan)
>      count = 0
>      iter = scanner.iterator
>
>      # Iterate results
>      while iter.hasNext
>        row = iter.next
>        count += 1
>        next unless (block_given? && count % interval == 0)
>        # Allow command modules to visualize counting process
>        yield(count, String.from_java_bytes(row.getRow))
>      end
>
>      # Return the counter
>      return count
>    end
>
>
> St.Ack
>
> On Thu, Feb 3, 2011 at 6:47 AM, Something Something
> <[EMAIL PROTECTED]> wrote:
> > Thanks.  So I will add this...
> >
> >   scan.setFilter(new FirstKeyOnlyFilter());
> >
> > But after I do this...
> >
> >   Result result = scanner.next();
> >
> > There's no...  result.getKey() - so what method would give me the Key
> value?
> >
> >
> >
> > On Wed, Feb 2, 2011 at 10:20 PM, Stack <[EMAIL PROTECTED]> wrote:
> >
> >> See
> >>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html
> >> St.Ack
> >>
> >> On Thu, Feb 3, 2011 at 6:01 AM, Something Something
> >> <[EMAIL PROTECTED]> wrote:
> >> > I want to read only the keys in a table. I tried this...
> >> >
> >> >    try {
> >> >
> >> >  HTable table = new HTable("myTable");
> >> >
> >> >  Scan scan = new Scan();
> >> >
> >> >  scan.addFamily(Bytes.toBytes("Info"));
> >> >
> >> >  ResultScanner scanner = table.getScanner(scan);
> >> >
> >> >   Result result = scanner.next();
> >> >
> >> >  while (result != null) {
> >> >
> >> > & so on...
> >> >
> >> > This was performing fairly well until I added another Family that
> >> contains
> >> > lots of key/value pairs.  My understanding was that adding another
> family
> >> > wouldn't affect performance of this code because I am explicitly using
> >> > "Info", but it is.
> >> >
> >> > Anyway, in this particular use case, I only care about the "Key" of
> the
> >> row.
> >> >  I don't need any values from any of the families.  What's the best
> way
> >> to
> >> > do this?
> >> >
> >> > Please let me know.  Thanks.
> >> >
> >>
> >
>