|
|
-
Re: mortbay, huge files and the ulimitVasco Visser 2012-09-05, 13:32
Now that I think about it I wonder about oracle's description.
[...] if more than 98% of the total time is spent in garbage collection [...] I took that to mean 98% of CPU time for the application, but it can also mean that the GC is active for 98% percent of wall clock time. If it is the former it implies the application isn't doing useful stuff anymore (e.g., keeping a connection open). If it is the latter I don't know if it possible to say anything about how much the program is still doing useful stuff. On Wed, Sep 5, 2012 at 2:48 PM, Björn-Elmar Macek <[EMAIL PROTECTED]> wrote: > Hi Vasco, > > thank you for your help! > > I can try to add the limit again (i currently have it turned off for all > Java processes spawned by Hadoop). Also i do not have any persistent > (member-) variables, that could store things that use alot of data: at the > very moment i only got 2 local variables in the reduce-method, that could > get a little larger - but those should be gc'ed easily, shouldnt they? > > Thank you for your thoughts again. I will pay with the GC a little. If i > know the reason, i'll let you know! > Best regards, > Elmar > > > Am 05.09.2012 14:38, schrieb Vasco Visser: > >> Just a guess, but could it simply be memory issue on reducer? >> >> In your adjusted program, maybe try running without >> -UseGCOverheadLimit and see if you still got OOM errors. >> >> From the sun website: >> The parallel collector will throw an OutOfMemoryError if too much time >> is being spent in garbage collection: if more than 98% of the total >> time is spent in garbage collection and less than 2% of the heap is >> recovered, an OutOfMemoryError will be thrown. This feature is >> designed to prevent applications from running for an extended period >> of time while making little or no progress because the heap is too >> small. If necessary, this feature can be disabled by adding the option >> -XX:-UseGCOverheadLimit to the command line. >> >> So basically if you get a GC overhead limit exceeded OOM error it >> means that your app is doing nothing but garbage collection, the vm is >> fighting against running out of memory. If that happens it might >> result in a timeout of the connection fetching the map data (i don't >> know if it can result in timeout, but I can imagine it could). >> >> Also, note that 63 GB on disk is probably going to be inflated in >> memory. So in general you cant say that 60 GB on disk will need 60GB >> of mem. Actually, some people us a rule of thumb to do x4 to get >> approx mem requirement. >> >> Just some ideas, not really a solution but maybe it helps you further. >> >> On Wed, Sep 5, 2012 at 2:02 PM, Björn-Elmar Macek >> <[EMAIL PROTECTED]> wrote: >>> >>> Excuse me: in my last code section was some old code included. Here is it >>> again stripped of deprecated code: >>> >>> >>> package uni.kassel.macek.rtprep; >>> >>> >>> >>> import gnu.trove.iterator.TIntIterator; >>> import gnu.trove.map.hash.TIntObjectHashMap; >>> import gnu.trove.set.TIntSet; >>> >>> import java.io.IOException; >>> import java.util.ArrayList; >>> import java.util.Calendar; >>> import java.util.Iterator; >>> import java.util.List; >>> import java.util.StringTokenizer; >>> >>> import org.apache.hadoop.conf.Configuration; >>> import org.apache.hadoop.fs.Path; >>> import org.apache.hadoop.io.IntWritable; >>> import org.apache.hadoop.io.LongWritable; >>> import org.apache.hadoop.io.Text; >>> import org.apache.hadoop.mapred.MapReduceBase; >>> import org.apache.hadoop.mapred.OutputCollector; >>> import org.apache.hadoop.mapred.Reporter; >>> import org.apache.hadoop.mapreduce.Job; >>> import org.apache.hadoop.mapreduce.Mapper; >>> import org.apache.hadoop.mapreduce.Reducer; >>> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; >>> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; >>> import org.apache.hadoop.util.GenericOptionsParser; >>> >>> public class RetweetApplication { >>> >>> public static class RetweetMapper1 extends Mapper<Object, Text, +
Björn-Elmar Macek 2012-09-05, 13:52
+
Björn-Elmar Macek 2012-08-29, 14:32
+
Björn-Elmar Macek 2012-08-29, 13:53
+
Björn-Elmar Macek 2012-08-30, 10:27
+
Björn-Elmar Macek 2012-08-31, 12:08
+
Björn-Elmar Macek 2012-09-05, 11:56
+
Björn-Elmar Macek 2012-09-05, 12:02
|