Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> mortbay, huge files and the ulimit


Copy link to this message
-
Re: mortbay, huge files and the ulimit
Now that I think about it I wonder about oracle's description.

[...] if more than 98% of the total time is spent in garbage collection [...]

I took that to mean 98% of CPU time for the application, but it can
also mean that the GC is active for 98% percent of wall clock time.

If it is the former it implies the application isn't doing useful
stuff anymore (e.g., keeping a connection open). If it is the latter I
don't know if it possible to say anything about how much the program
is still doing useful stuff.

On Wed, Sep 5, 2012 at 2:48 PM, Björn-Elmar Macek
<[EMAIL PROTECTED]> wrote:
> Hi Vasco,
>
> thank you for your help!
>
> I can try to add the limit again (i currently have it turned off for all
> Java processes spawned by Hadoop). Also i do not have any persistent
> (member-) variables, that could store things that use alot of data: at the
> very moment i only got 2 local variables in the reduce-method, that could
> get a little larger - but those should be gc'ed easily, shouldnt they?
>
> Thank you for your thoughts again. I will pay with the GC a little. If i
> know the reason, i'll let you know!
> Best regards,
> Elmar
>
>
> Am 05.09.2012 14:38, schrieb Vasco Visser:
>
>> Just a guess, but could it simply be memory issue on reducer?
>>
>> In your adjusted program, maybe try running without
>> -UseGCOverheadLimit and see if you still got OOM errors.
>>
>>  From the sun website:
>> The parallel collector will throw an OutOfMemoryError if too much time
>> is being spent in garbage collection: if more than 98% of the total
>> time is spent in garbage collection and less than 2% of the heap is
>> recovered, an OutOfMemoryError will be thrown. This feature is
>> designed to prevent applications from running for an extended period
>> of time while making little or no progress because the heap is too
>> small. If necessary, this feature can be disabled by adding the option
>> -XX:-UseGCOverheadLimit to the command line.
>>
>> So basically if you get a GC overhead limit exceeded OOM error it
>> means that your app is doing nothing but garbage collection, the vm is
>> fighting against running out of memory. If that happens it might
>> result in a timeout of the connection fetching the map data (i don't
>> know if it can result in timeout, but I can imagine it could).
>>
>> Also, note that 63 GB on disk is probably going to be inflated in
>> memory. So in general you cant say that 60 GB on disk will need 60GB
>> of mem. Actually, some people us a rule of thumb to do x4 to get
>> approx mem requirement.
>>
>> Just some ideas, not really a solution but maybe it helps you further.
>>
>> On Wed, Sep 5, 2012 at 2:02 PM, Björn-Elmar Macek
>> <[EMAIL PROTECTED]> wrote:
>>>
>>> Excuse me: in my last code section was some old code included. Here is it
>>> again stripped of deprecated code:
>>>
>>>
>>> package uni.kassel.macek.rtprep;
>>>
>>>
>>>
>>> import gnu.trove.iterator.TIntIterator;
>>> import gnu.trove.map.hash.TIntObjectHashMap;
>>> import gnu.trove.set.TIntSet;
>>>
>>> import java.io.IOException;
>>> import java.util.ArrayList;
>>> import java.util.Calendar;
>>> import java.util.Iterator;
>>> import java.util.List;
>>> import java.util.StringTokenizer;
>>>
>>> import org.apache.hadoop.conf.Configuration;
>>> import org.apache.hadoop.fs.Path;
>>> import org.apache.hadoop.io.IntWritable;
>>> import org.apache.hadoop.io.LongWritable;
>>> import org.apache.hadoop.io.Text;
>>> import org.apache.hadoop.mapred.MapReduceBase;
>>> import org.apache.hadoop.mapred.OutputCollector;
>>> import org.apache.hadoop.mapred.Reporter;
>>> import org.apache.hadoop.mapreduce.Job;
>>> import org.apache.hadoop.mapreduce.Mapper;
>>> import org.apache.hadoop.mapreduce.Reducer;
>>> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
>>> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
>>> import org.apache.hadoop.util.GenericOptionsParser;
>>>
>>> public class RetweetApplication {
>>>
>>>      public static class RetweetMapper1 extends Mapper<Object, Text,
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB