Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Re: Distributing the code to multiple nodes


Copy link to this message
-
Re: Distributing the code to multiple nodes
Ashish Jain 2014-01-16, 07:09
Voila!! It worked finally :). Thanks a lot for all the support from all the
folks in this forum. So here is the summary for others on what I did
finally to solve this up:

1) change the framework to yarn using mapreduce.framework.name in
mapred-site.xml
2) In yarn-site.xml add the following properties
<name>yarn.nodemanager.resource.memory-mb</name>
<name>yarn.scheduler.minimum-allocation-mb</name>
3) In mapred-site.xml add the following properties
<name>mapreduce.map.memory.mb</name>
<name>mapreduce.reduce.memory.mb</name>
<name>mapreduce.map.java.opts</name>
<name>mapreduce.reduce.java.opts</name>
4) Use capacity scheduler. I think fair scheduler may also work but I used
capacity scheduler

Start the system and run the jobs it will be distributed across all the
nodes. I could see 8 map jobs running because I had 8 data blocks and also
all the nodes serving the request. However I still see only 1 reduce job I
will address that in a separate post

--Ashish
On Wed, Jan 15, 2014 at 7:23 PM, sudhakara st <[EMAIL PROTECTED]>wrote:

> Hello Ashish
>
>
> WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
> Resource manager trying allocate memory 2GB but it available 1GB.
>
>
> On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <[EMAIL PROTECTED]> wrote:
>
>> I tried that but somehow my map reduce jobs do not execute at all once I
>> set it to yarn
>>
>>
>> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <[EMAIL PROTECTED]
>> > wrote:
>>
>>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>>> mapred-site.xml
>>>
>>>
>>>
>>> In mapred-site.xml you just have to mention:
>>>
>>> <property>
>>>
>>> <name>mapreduce.framework.name</name>
>>>
>>> <value>yarn</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> -Nirmal
>>>
>>> *From:* Ashish Jain [mailto:[EMAIL PROTECTED]]
>>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>>
>>> *To:* [EMAIL PROTECTED]
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> I think this is the problem. I have not set
>>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>>> set to local. Now the question is how to set it up to remote. Documentation
>>> says I need to specify the host:port of the job tracker for this. As we
>>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>>> tracker and job tracker. Instead there is now resource manager and node
>>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>>> Do I set is resourceMangerHost:resourceMangerPort?
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <[EMAIL PROTECTED]> wrote:
>>>
>>>  Hi Sudhakar,
>>>
>>> Indeed there was a type the complete command is as follows except the
>>> main class since my manifest has the entry for main class.
>>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> Next I killed the datanode in 10.12.11.210 and l see the following
>>> messages in the log files. Looks like the namenode is still trying to
>>> assign the complete task to one single node and since it does not find the
>>> complete data set in one node it is complaining.
>>>
>>>
>>> 2014-01-15 16:38:26,894 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,348 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,