Rohit Kelkar 2011-12-06, 08:50
Stack 2011-12-06, 15:08
Rohit Kelkar 2011-12-07, 05:21
Lars George 2011-12-09, 09:01
Rohit Kelkar 2011-12-09, 09:29
-Re: mapper not running on the same region server
Lars George 2011-12-09, 11:00
I do not have the exact code in my head, but would assume that when you kill the job the correct node is properly up for grabs. I was referring to the name of the nodes as the JobTracker sees it vs what HBase reports them as when the getSplits() of the input format is called. It might be that this differs and therefore the framework does not take the locality hint into consideration.
On Dec 9, 2011, at 10:29 AM, Rohit Kelkar wrote:
> Hi Lars, by naming issue, do you mean if the zookeeper nodes and hbase
> nodes are correctly configured?
> I observed that this issue occurs intermittently. Sometimes the mapper
> gets scheduled on the correct node. Would that be because I am killing
> the job frequently and hadoop is prioritizing the nodes based on how
> often (or less often) the scheduled job successfully completes?
> - Rohit Kelkar
> On Fri, Dec 9, 2011 at 2:31 PM, Lars George <[EMAIL PROTECTED]> wrote:
>> Do you have maybe an issue with naming. HBase takes the hostname (as shown in the UI and the ZK dump there) and hints that to the MR framework. But if that resolves to different names, then no match can be made and the node to run the task on is chosen by random. Could you verify?
>> On Dec 7, 2011, at 6:21 AM, Rohit Kelkar wrote:
>>> My hadoop cluster has 3 nodes in it and hbase too runs on the same 3
>>> nodes. But the table that I am speaking of has only one region and
>>> http://master:50030/jobtracker.jsp shows only one mapper running.
>>> - Rohit Kelkar
>>> On Tue, Dec 6, 2011 at 8:38 PM, Stack <[EMAIL PROTECTED]> wrote:
>>>> On Tue, Dec 6, 2011 at 12:50 AM, Rohit Kelkar <[EMAIL PROTECTED]> wrote:
>>>>> I am running a mapreduce job on a hbase table. I hava a 3 node
>>>>> cluster. Currently the table has only a few rows. When I visit the
>>>>> http://master:60010/master.jsp I can see that the table resides on
>>>>> only one region server. When I run my mapreduce job on this table I
>>>>> see the mapper running on a different node of my cluster. Shouldn't
>>>>> the mapper be running on the same node that hosts the table?
>>>>> I am using the TableMapReduceUtil.initTableMapperJob method to
>>>>> initialize the mapreduce job.
>>>> Yes. Mappers should be running by the data.
>>>> You have only one region in your table or more than one region and
>>>> more than one mapper is running?