Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Scan only talks to a single region server


Copy link to this message
-
Re: Scan only talks to a single region server
Jean-Marc Spaggiari 2012-07-17, 15:42
Hi,

I'm not 100% sure but I think getScanner return a result scanner and
not the result itself.

What you need to do is something like
   ResultScanner scanner = table_work_proposed.getScanner(scan);
Result[] results = scanner.next(linesToRead);
while (results.length > 0)
{
for (Result result : results)
{
// Do something or nothing
byte[] row = result.getRow();
}
results = scanner.next(linesToRead);
}

On your example I think you are counting the results scanners. Not the rows.

JM

2012/7/17, Alex Baranau <[EMAIL PROTECTED]>:
>> this scan is running
>> inside a map task
>
> How do you create your scan(ner)? Could you paste the code here?
>
> You know that when HBase table is used as a source for MapReduce job (via
> standard configuration), each Map task consumes data from one region (apart
> from other things, it tries to benefit from data locality). I.e. it creates
> one Map task per region. I wonder if this can be related.
>
> Sorry for obvious check...
>
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>
> On Tue, Jul 17, 2012 at 11:11 AM, Whitney Sorenson
> <[EMAIL PROTECTED]>wrote:
>
>> I'm trying to scan across an entire table (using only a specific
>> family or family + qualifier).
>>
>> I've tried various methods but I can only get this scan to touch the
>> first region server. Afterwords, it stops processing. Issuing the same
>> scan in the shell works (returns 50,000 rows) whereas the Scan made
>> from Java only returns ~4000 rows.
>>
>> I've tried adding/removing start/stop rows, using getScanner(family,
>> column) vs getScanner(scan), and restarting the region servers which
>> host the 1st and 2nd regions.
>>
>> The debug output from the scan shows that it knows about locations for
>> each region; however, it calls close after the first region.
>>
>> In the simplest case, the code looks like:
>>
>> ResultScanner rs = table.getScanner(family, qualifier);
>> for (Result r : rs) {
>> // do something
>> }
>>
>> Any ideas or known issues? (0.90.4-cdh3u2 - this scan is running
>> inside a map task)
>>
>> I figure the next step is to walk through the client scanner code
>> locally in a java main but haven't done this yet.
>>
>
>
>
> --
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>