|
|
it seems that if I put too many records into the same mapper output key, all these records are grouped into one key one one reducer,
then the reducer became out of memory. but the reducer interface is:
public void reduce(K key, Iterator<V> values, OutputCollector<K, V> output, Reporter reporter) so all the values belonging to the key can be iterated, so theoretically they can be iterated from disk, and does not have to be in memory at the same time, so why am I getting out of heap error? is there some param I could tune (apart from -Xmx since my box is ultimately bounded in memory capacity)
thanks Yang
-
Re: reducer out of memory?
Harsh J 2012-05-10, 05:12
Can you share your job details (or a sample reducer code) and also share your exact error?
If you are holding reducer provided values/keys in memory in your implementation, it can easily cause an OOME if not handled properly. The reducer by itself does read the values off a sorted file on the disk and doesn't cache the whole group in memory.
On Thu, May 10, 2012 at 12:20 AM, Yang <[EMAIL PROTECTED]> wrote: > it seems that if I put too many records into the same mapper output > key, all these records are grouped into one key one one reducer, > > then the reducer became out of memory. > > > but the reducer interface is: > > public void reduce(K key, Iterator<V> values, > OutputCollector<K, V> output, > Reporter reporter) > > > so all the values belonging to the key can be iterated, so > theoretically they can be iterated from disk, and does not have to be > in memory at the same time, > so why am I getting out of heap error? is there some param I could > tune (apart from -Xmx since my box is ultimately bounded in memory > capacity) > > thanks > Yang
-- Harsh J
-
Re: reducer out of memory?
Zizon Qiu 2012-05-10, 06:27
try setting a lower value for mapred.job.shuffle.input.buffer.percent . the reducer used it to decide whether use in-memory shuffle. the default value is 0.7,meaning 70% of the "memory" are used as shuffle buffer.
On Thu, May 10, 2012 at 2:50 AM, Yang <[EMAIL PROTECTED]> wrote:
> it seems that if I put too many records into the same mapper output > key, all these records are grouped into one key one one reducer, > > then the reducer became out of memory. > > > but the reducer interface is: > > public void reduce(K key, Iterator<V> values, > OutputCollector<K, V> output, > Reporter reporter) > > > so all the values belonging to the key can be iterated, so > theoretically they can be iterated from disk, and does not have to be > in memory at the same time, > so why am I getting out of heap error? is there some param I could > tune (apart from -Xmx since my box is ultimately bounded in memory > capacity) > > thanks > Yang >
-
Re: reducer out of memory?
Yang 2012-05-10, 18:49
thanks, let me try this On Wed, May 9, 2012 at 11:27 PM, Zizon Qiu <[EMAIL PROTECTED]> wrote: > try setting a lower value for mapred.job.shuffle.input.buffer.percent . > the reducer used it to decide whether use in-memory shuffle. > the default value is 0.7,meaning 70% of the "memory" are used as shuffle > buffer. > > On Thu, May 10, 2012 at 2:50 AM, Yang <[EMAIL PROTECTED]> wrote: > >> it seems that if I put too many records into the same mapper output >> key, all these records are grouped into one key one one reducer, >> >> then the reducer became out of memory. >> >> >> but the reducer interface is: >> >> public void reduce(K key, Iterator<V> values, >> OutputCollector<K, V> output, >> Reporter reporter) >> >> >> so all the values belonging to the key can be iterated, so >> theoretically they can be iterated from disk, and does not have to be >> in memory at the same time, >> so why am I getting out of heap error? is there some param I could >> tune (apart from -Xmx since my box is ultimately bounded in memory >> capacity) >> >> thanks >> Yang >>
-
Re: reducer out of memory?
Yang 2012-05-10, 18:50
thanks, let me run more of this with the settings provided later in this thread and provide the details
On Wed, May 9, 2012 at 10:12 PM, Harsh J <[EMAIL PROTECTED]> wrote: > Can you share your job details (or a sample reducer code) and also > share your exact error? > > If you are holding reducer provided values/keys in memory in your > implementation, it can easily cause an OOME if not handled properly. > The reducer by itself does read the values off a sorted file on the > disk and doesn't cache the whole group in memory. > > On Thu, May 10, 2012 at 12:20 AM, Yang <[EMAIL PROTECTED]> wrote: >> it seems that if I put too many records into the same mapper output >> key, all these records are grouped into one key one one reducer, >> >> then the reducer became out of memory. >> >> >> but the reducer interface is: >> >> public void reduce(K key, Iterator<V> values, >> OutputCollector<K, V> output, >> Reporter reporter) >> >> >> so all the values belonging to the key can be iterated, so >> theoretically they can be iterated from disk, and does not have to be >> in memory at the same time, >> so why am I getting out of heap error? is there some param I could >> tune (apart from -Xmx since my box is ultimately bounded in memory >> capacity) >> >> thanks >> Yang > > > > -- > Harsh J
|
|