Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> mortbay, huge files and the ulimit

Copy link to this message
Re: mortbay, huge files and the ulimit
Now that I think about it I wonder about oracle's description.

[...] if more than 98% of the total time is spent in garbage collection [...]

I took that to mean 98% of CPU time for the application, but it can
also mean that the GC is active for 98% percent of wall clock time.

If it is the former it implies the application isn't doing useful
stuff anymore (e.g., keeping a connection open). If it is the latter I
don't know if it possible to say anything about how much the program
is still doing useful stuff.

On Wed, Sep 5, 2012 at 2:48 PM, Björn-Elmar Macek
> Hi Vasco,
> thank you for your help!
> I can try to add the limit again (i currently have it turned off for all
> Java processes spawned by Hadoop). Also i do not have any persistent
> (member-) variables, that could store things that use alot of data: at the
> very moment i only got 2 local variables in the reduce-method, that could
> get a little larger - but those should be gc'ed easily, shouldnt they?
> Thank you for your thoughts again. I will pay with the GC a little. If i
> know the reason, i'll let you know!
> Best regards,
> Elmar
> Am 05.09.2012 14:38, schrieb Vasco Visser:
>> Just a guess, but could it simply be memory issue on reducer?
>> In your adjusted program, maybe try running without
>> -UseGCOverheadLimit and see if you still got OOM errors.
>>  From the sun website:
>> The parallel collector will throw an OutOfMemoryError if too much time
>> is being spent in garbage collection: if more than 98% of the total
>> time is spent in garbage collection and less than 2% of the heap is
>> recovered, an OutOfMemoryError will be thrown. This feature is
>> designed to prevent applications from running for an extended period
>> of time while making little or no progress because the heap is too
>> small. If necessary, this feature can be disabled by adding the option
>> -XX:-UseGCOverheadLimit to the command line.
>> So basically if you get a GC overhead limit exceeded OOM error it
>> means that your app is doing nothing but garbage collection, the vm is
>> fighting against running out of memory. If that happens it might
>> result in a timeout of the connection fetching the map data (i don't
>> know if it can result in timeout, but I can imagine it could).
>> Also, note that 63 GB on disk is probably going to be inflated in
>> memory. So in general you cant say that 60 GB on disk will need 60GB
>> of mem. Actually, some people us a rule of thumb to do x4 to get
>> approx mem requirement.
>> Just some ideas, not really a solution but maybe it helps you further.
>> On Wed, Sep 5, 2012 at 2:02 PM, Björn-Elmar Macek
>> <[EMAIL PROTECTED]> wrote:
>>> Excuse me: in my last code section was some old code included. Here is it
>>> again stripped of deprecated code:
>>> package uni.kassel.macek.rtprep;
>>> import gnu.trove.iterator.TIntIterator;
>>> import gnu.trove.map.hash.TIntObjectHashMap;
>>> import gnu.trove.set.TIntSet;
>>> import java.io.IOException;
>>> import java.util.ArrayList;
>>> import java.util.Calendar;
>>> import java.util.Iterator;
>>> import java.util.List;
>>> import java.util.StringTokenizer;
>>> import org.apache.hadoop.conf.Configuration;
>>> import org.apache.hadoop.fs.Path;
>>> import org.apache.hadoop.io.IntWritable;
>>> import org.apache.hadoop.io.LongWritable;
>>> import org.apache.hadoop.io.Text;
>>> import org.apache.hadoop.mapred.MapReduceBase;
>>> import org.apache.hadoop.mapred.OutputCollector;
>>> import org.apache.hadoop.mapred.Reporter;
>>> import org.apache.hadoop.mapreduce.Job;
>>> import org.apache.hadoop.mapreduce.Mapper;
>>> import org.apache.hadoop.mapreduce.Reducer;
>>> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
>>> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
>>> import org.apache.hadoop.util.GenericOptionsParser;
>>> public class RetweetApplication {
>>>      public static class RetweetMapper1 extends Mapper<Object, Text,