Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)


Copy link to this message
-
Re: YARN Pi example job stuck at 0%(No MR tasks are started by ResourceManager)
anil gupta 2012-07-30, 23:03
Hi Harsh,

I modified the mapred-site.xml and yarn-site so that MR jobs can run in 1.2
Gb of memory. Here is the mapred-site.xml: http://pastebin.com/Fxjie6kg and
yarn-site.xml: http://pastebin.com/TCJuDAhe. After updating the conf the MR
jobs seemingly start map processes but the job fails at 0%. In the web page
of
http://data-node:8042/node/containerlogs/container_1343687008058_0003_01_000001/rootthe
page says:
Failed redirect for container_1343687008058_0003_01_000001  Failed while
trying to construct the redirect url to the log server. Log Server url may
not be configured. Unknown container. Container either has not started or
has already completed or doesn't belong to this node at all.

Do you have any idea about this problem? I searched on internet and i got
this discussion on cdh forum(
https://groups.google.com/a/cloudera.org/forum/?fromgroups#!topic/cdh-user/AwCRuaPm7e0)
but no resolution was posted over there.

Thanks,

Anil

On Fri, Jul 27, 2012 at 4:39 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> I think its alright if we may fail the app if it requests what is
> impossible, rather than log or wait for an admin to come along and fix
> it in runtime. Please do file a JIRA.
>
> The max allocation value can perhaps also be dynamically set to the
> maximum offered RAM value across the NMs that are live, or a fraction
> of it? That is what caused this hang in the first place (by letting it
> go in as a valid request, since default max alloc is about 10 GB).
>
> On Sat, Jul 28, 2012 at 4:52 AM, anil gupta <[EMAIL PROTECTED]> wrote:
> > Hi Harsh,
> >
> > Thanks a lot for your response. I am going to try your suggestions and
> let
> > you know the outcome.
> > I am running the cluster on VMWare hypervisor. I have 3 physical machines
> > with 16GB of RAM, and 4TB( 2 HD of 2TB each). On every machine i am
> running
> > 4 VM's. Each VM is having 3.2 GB of memory. I built this cluster for
> trying
> > out HA(NN, ZK, HMaster) since we are little reluctant to deploy anything
> > without HA in prod.
> > This cluster is supposed to be used as HBase cluster and MR is going to
> be
> > used only for Bulk Loading. Also, my data dump is around 10 GB(which is
> > pretty small for Hadoop). I am going to load this data in 4 different
> > schema which will be roughly 150 million records for HBase.
> > So, i think i will lower down the memory requirement of Yarn for my use
> > case rather than reducing the number of data nodes to increase the memory
> > of remaining Data Nodes. Do you think this will be the right approach for
> > my cluster environment?
> > Also, on a side note, shouldn't the NodeManager throw an error on this
> kind
> > of memory problem? Should i file a JIRA for this? It just sat quietly
> over
> > there.
> >
> > Thanks a lot,
> > Anil Gupta
> >
> > On Fri, Jul 27, 2012 at 3:36 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> >
> >> Hi,
> >>
> >> The 'root' doesn't matter. You may run jobs as any username on an
> >> unsecured cluster, should be just the same.
> >>
> >> The config yarn.nodemanager.resource.memory-mb = 1200 is your issue.
> >> By default, the tasks will execute with a resource demand of 1 GB, and
> >> the AM itself demands, by default, 1.5 GB to run. None of your nodes
> >> are hence able to start your AM (demand=1500mb) and hence if the AM
> >> doesn't start, your job won't initiate either.
> >>
> >> You can do a few things:
> >>
> >> 1. Raise yarn.nodemanager.resource.memory-mb to a value close to 4 GB
> >> perhaps, if you have the RAM? Think of it as the new 'slots' divider.
> >> The larger the offering (close to total RAM you can offer for
> >> containers from the machine), the more the tasks that may run on it
> >> (depending on their own demand, of course). Reboot the NM's one by one
> >> and this app will begin to execute.
> >> 2. Lower the AM's requirement, i.e. lower
> >> yarn.app.mapreduce.am.resource.mb in your client's mapred-site.xml or
> >> job config from 1500 to 1000 or less, so it fits in the NM's offering.

Thanks & Regards,
Anil Gupta