Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Problems with timeout when a Hadoop job generates a large number of key-value pairs


Copy link to this message
-
Re: Problems with timeout when a Hadoop job generates a large number of key-value pairs
Michael Segel 2012-01-21, 04:15
Thats the one ...

Sent from my iPhone

On Jan 20, 2012, at 6:28 PM, "Paul Ho" <[EMAIL PROTECTED]> wrote:

> I think the balancing bandwidth property you are looking for is in hdfs-site.xml:
>
>    <property>
>        <name>dfs.balance.bandwidthPerSec</name>
>        <value>402653184</value>
>    </property>
>
> Set the value that makes most sense for your NIC. But I thought this is only for balancing.
>
> On Jan 20, 2012, at 3:43 PM, Michael Segel wrote:
>
>> Steve,
>> Ok, first your client connection to the cluster is a non issue.
>>
>> If you go in to /etc/Hadoop/conf
>> That supposed to be a little h but my iPhone knows what's best...
>>
>> Look and see what you have set for your bandwidth... I forget which parameter but there are only a couple that deal with bandwidth.
>> I think it's set to 1mb or 10mb by default. You need to up it to 100-200mb if you're on a 1 GB network .
>>
>> That would solve you balancing issue.
>>
>> See if that helps...
>>
>> Sent from my iPhone
>>
>> On Jan 20, 2012, at 4:57 PM, "Steve Lewis" <[EMAIL PROTECTED]> wrote:
>>
>>> On Fri, Jan 20, 2012 at 12:18 PM, Michel Segel <[EMAIL PROTECTED]>wrote:
>>>
>>>> Steve,
>>>> If you want me to debug your code, I'll be glad to set up a billable
>>>> contract... ;-)
>>>>
>>>> What I am willing to do is to help you to debug your code..
>>>
>>>
>>> The code seems to work well for small input files and is basically a
>>> standard sample.
>>>
>>>> .
>>>>
>>>> Did you time how long it takes in the Mapper.map() method?
>>>> The reason I asked this is to first confirm that you are failing within a
>>>> map() method.
>>>> It could be that you're just not updating your status...
>>>>
>>>
>>> The map map method starts out running very fast - generateSubstrings - the
>>> only interesting part runs in milliseconds. The only other thing the mapper
>>> does is context,write which SHOULD update status
>>>
>>>>
>>>> You said that you are writing many output records for a single input.
>>>>
>>>> So let's take a look at your code.
>>>> Are all writes of the same length? Meaning that in each iteration of
>>>> Mapper.map() you will always write. K number of rows?
>>>>
>>>
>>> Because in my sample the input strings are the same length - every call to
>>> the mapper will write the same number of records
>>>
>>>>
>>>> If so, ask yourself why some iterations are taking longer and longer?
>>>>
>>>
>>> I believe the issue may relate to local storage getting filled and Hadoop
>>> taking a LOT of time to rebalance the output, Assuming the string length is
>>> the same on each map there is no reason for some iterations to me longer
>>> than others
>>>
>>>>
>>>> Note: I'm assuming that the time for each iteration is taking longer than
>>>> the previous...
>>>>
>>>> I assume so as well since in m,y cluster the first 50% of mapping goes
>>> pretty fast
>>>
>>>> Or am I missing something?
>>>>
>>>> How do I get timing of map iteratons??
>>>
>>>> -Mike
>>>>
>>>> Sent from a remote device. Please excuse any typos...
>>>>
>>>> Mike Segel
>>>>
>>>> On Jan 20, 2012, at 11:16 AM, Steve Lewis <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> We have been having problems with mappers timing out after 600 sec when
>>>> the
>>>>> mapper writes many more, say thousands of records for every
>>>>> input record - even when the code in the mapper is small and fast. I
>>>>> no idea what could cause the system to be so slow and am reluctant to
>>>> raise
>>>>> the 600 sec limit without understanding why there should be a timeout
>>>> when
>>>>> all MY code is very fast.
>>>>> P
>>>>> I am enclosing a small sample which illustrates the problem. It will
>>>>> generate a 4GB text file on hdfs if the input file does not exist or is
>>>> not
>>>>> at least that size and this will take some time (hours in my
>>>> configuration)
>>>>> - then the code is essentially wordcount but instead of finding and
>>>>> emitting words - the mapper emits all substrings of the input data - this