Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: Yarn HDFS and Yarn Exceptions when processing "larger" datasets.


Copy link to this message
-
Re: Yarn HDFS and Yarn Exceptions when processing "larger" datasets.
Also due you see any exception in RM / NM logs?

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>
On Mon, Jul 1, 2013 at 11:19 AM, Omkar Joshi <[EMAIL PROTECTED]> wrote:

> Hi,
>
> As I don't know your complete AM code and how your containers are
> communicating with each other...Certain things which might help you in
> debugging.... where you are starting your RM (is it really running on
> 8030???? are you sure there is no previously started RM still running
> there?) Also in yarn-site.xml can you try changing RM address to something
> like "localhost:<free-port-but-not-default>" and configure maximum client
> thread size for handling AM requests? only your AM is expected to
> communicate with RM on AM-RM protocol.. by any chance in your code; are
> containers directly communicating with RM on AM-RM protocol??
>
>   <property>
>
>     <description>The address of the scheduler interface.</description>
>
>     <name>yarn.resourcemanager.scheduler.address</name>
>
>     <value>${yarn.resourcemanager.hostname}:8030</value>
>
>   </property>
>
>
>   <property>
>
>     <description>Number of threads to handle scheduler interface.</
> description>
>
>     <name>yarn.resourcemanager.scheduler.client.thread-count</name>
>
>     <value>50</value>
>
>   </property>
>
>
> Thanks,
> Omkar Joshi
> *Hortonworks Inc.* <http://www.hortonworks.com>
>
>
> On Fri, Jun 28, 2013 at 5:35 AM, blah blah <[EMAIL PROTECTED]> wrote:
>
>> Hi
>>
>> Sorry to reply so late. I don't have the data you requested (sorry I have
>> no time, my deadline is within 3 days). However I have observed that this
>> issue occurs not only for the "larger" datasets (6.8MB), but for all
>> datasets and all jobs in general. However for smaller datasets (1MB) the AM
>> does not throw the Exception, only containers throw exceptions (same as in
>> previous e-mail). When these exception are throws my code (AM and
>> containers) does not perform any operations on HDFS, they only perform
>> in-memory computation and communication. Also I have observed that these
>> exception occur at "random", I couldn't observe any pattern. I can execute
>> job successfully, then resubmit the job repeating the experiment and these
>> exceptions occur (no change was made to src code, input dataset,or
>> execution/input parameters).
>>
>> As for the high network usage, as I said I don't have the data. But YARN
>> is running on nodes which are exclusive for my experiments no other
>> software runs on these nodes (only OS and YARN). Besides I don't think that
>> 20 containers working on 1MB dataset (total) can be called high network
>> usage.
>>
>> regards
>> tmp
>>
>>
>>
>> 2013/6/26 Devaraj k <[EMAIL PROTECTED]>
>>
>>>  Hi,****
>>>
>>> ** **
>>>
>>>    Could you check the network usage in the cluster when this problem
>>> occurs? Probably it is causing due to high network usage. ****
>>>
>>> ** **
>>>
>>> Thanks****
>>>
>>> Devaraj k****
>>>
>>> ** **
>>>
>>> *From:* blah blah [mailto:[EMAIL PROTECTED]]
>>> *Sent:* 26 June 2013 05:39
>>> *To:* [EMAIL PROTECTED]
>>> *Subject:* Yarn HDFS and Yarn Exceptions when processing "larger"
>>> datasets.****
>>>
>>> ** **
>>>
>>> Hi All****
>>>
>>> First let me excuse for the poor thread title but I have no idea how to
>>> express the problem in one sentence. ****
>>>
>>> I have implemented new Application Master with the use of Yarn. I am
>>> using old Yarn development version. Revision 1437315, from 2013-01-23
>>> (SNAPSHOT 3.0.0). I can not update to current trunk version, as prototype
>>> deadline is soon, and I don't have time to include Yarn API changes.****
>>>
>>> Currently I execute experiments in pseudo-distributed mode, I use guava
>>> version 14.0-rc1. I have a problem with Yarn's and HDFS Exceptions for
>>> "larger" datasets. My AM works fine and I can execute it without a problem
>>> for a debug dataset (1MB size). But when I increase the size of input to
>>> 6.8 MB, I am getting the following exceptions:****
>>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB