Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Scan only talks to a single region server


Copy link to this message
-
Re: Scan only talks to a single region server
Jimmy Xiang 2012-07-17, 18:06
Hi Whitney,

The scanner will automatically jump to the next region server once the
current region server is scanned.

In the client, can HTable.getStartEndKeys() see all the regions and
region servers?

Thanks,
Jimmy

On Tue, Jul 17, 2012 at 10:47 AM, Whitney Sorenson
<[EMAIL PROTECTED]> wrote:
> The code is pasted above, here it is again:
>
> ResultScanner rs = table.getScanner(family, qualifier);
> for (Result r : rs) {
> // do something
> }
>
> ResultScanner's are iterable which means you can for:each them. In
> addition, the debug logs indicate that the scanner only ever retrieves
> rows from the first region server.
>
> On Tue, Jul 17, 2012 at 12:02 PM, Alex Baranau <[EMAIL PROTECTED]> wrote:
>>> How do you create your scan(ner)? Could you paste the code here?
>>
>> Sorry, meant to ask how do you instantiate HTable, configuration objects.
>>
>> Alex Baranau
>> ------
>> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
>> Solr
>>
>> On Tue, Jul 17, 2012 at 11:37 AM, Alex Baranau <[EMAIL PROTECTED]>wrote:
>>
>>> > this scan is running
>>> > inside a map task
>>>
>>> How do you create your scan(ner)? Could you paste the code here?
>>>
>>> You know that when HBase table is used as a source for MapReduce job (via
>>> standard configuration), each Map task consumes data from one region (apart
>>> from other things, it tries to benefit from data locality). I.e. it creates
>>> one Map task per region. I wonder if this can be related.
>>>
>>> Sorry for obvious check...
>>>
>>> Alex Baranau
>>> ------
>>> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
>>> Solr
>>>
>>> On Tue, Jul 17, 2012 at 11:11 AM, Whitney Sorenson <[EMAIL PROTECTED]>wrote:
>>>
>>>> I'm trying to scan across an entire table (using only a specific
>>>> family or family + qualifier).
>>>>
>>>> I've tried various methods but I can only get this scan to touch the
>>>> first region server. Afterwords, it stops processing. Issuing the same
>>>> scan in the shell works (returns 50,000 rows) whereas the Scan made
>>>> from Java only returns ~4000 rows.
>>>>
>>>> I've tried adding/removing start/stop rows, using getScanner(family,
>>>> column) vs getScanner(scan), and restarting the region servers which
>>>> host the 1st and 2nd regions.
>>>>
>>>> The debug output from the scan shows that it knows about locations for
>>>> each region; however, it calls close after the first region.
>>>>
>>>> In the simplest case, the code looks like:
>>>>
>>>> ResultScanner rs = table.getScanner(family, qualifier);
>>>> for (Result r : rs) {
>>>> // do something
>>>> }
>>>>
>>>> Any ideas or known issues? (0.90.4-cdh3u2 - this scan is running
>>>> inside a map task)
>>>>
>>>> I figure the next step is to walk through the client scanner code
>>>> locally in a java main but haven't done this yet.
>>>>
>>>
>>>
>>>
>>> --
>>> Alex Baranau
>>> ------
>>> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
>>> Solr
>>>
>>>
>>
>>
>> --
>> Alex Baranau
>> ------
>> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
>> Solr