Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: Distributing the code to multiple nodes


Copy link to this message
-
Re: Distributing the code to multiple nodes
Voila!! It worked finally :). Thanks a lot for all the support from all the
folks in this forum. So here is the summary for others on what I did
finally to solve this up:

1) change the framework to yarn using mapreduce.framework.name in
mapred-site.xml
2) In yarn-site.xml add the following properties
<name>yarn.nodemanager.resource.memory-mb</name>
<name>yarn.scheduler.minimum-allocation-mb</name>
3) In mapred-site.xml add the following properties
<name>mapreduce.map.memory.mb</name>
<name>mapreduce.reduce.memory.mb</name>
<name>mapreduce.map.java.opts</name>
<name>mapreduce.reduce.java.opts</name>
4) Use capacity scheduler. I think fair scheduler may also work but I used
capacity scheduler

Start the system and run the jobs it will be distributed across all the
nodes. I could see 8 map jobs running because I had 8 data blocks and also
all the nodes serving the request. However I still see only 1 reduce job I
will address that in a separate post

--Ashish
On Wed, Jan 15, 2014 at 7:23 PM, sudhakara st <[EMAIL PROTECTED]>wrote:

> Hello Ashish
>
>
> WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
> Resource manager trying allocate memory 2GB but it available 1GB.
>
>
> On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <[EMAIL PROTECTED]> wrote:
>
>> I tried that but somehow my map reduce jobs do not execute at all once I
>> set it to yarn
>>
>>
>> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <[EMAIL PROTECTED]
>> > wrote:
>>
>>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>>> mapred-site.xml
>>>
>>>
>>>
>>> In mapred-site.xml you just have to mention:
>>>
>>> <property>
>>>
>>> <name>mapreduce.framework.name</name>
>>>
>>> <value>yarn</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> -Nirmal
>>>
>>> *From:* Ashish Jain [mailto:[EMAIL PROTECTED]]
>>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>>
>>> *To:* [EMAIL PROTECTED]
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> I think this is the problem. I have not set
>>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>>> set to local. Now the question is how to set it up to remote. Documentation
>>> says I need to specify the host:port of the job tracker for this. As we
>>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>>> tracker and job tracker. Instead there is now resource manager and node
>>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>>> Do I set is resourceMangerHost:resourceMangerPort?
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <[EMAIL PROTECTED]> wrote:
>>>
>>>  Hi Sudhakar,
>>>
>>> Indeed there was a type the complete command is as follows except the
>>> main class since my manifest has the entry for main class.
>>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> Next I killed the datanode in 10.12.11.210 and l see the following
>>> messages in the log files. Looks like the namenode is still trying to
>>> assign the complete task to one single node and since it does not find the
>>> complete data set in one node it is complaining.
>>>
>>>
>>> 2014-01-15 16:38:26,894 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,348 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB