That's what I suggested.
You can actually declare tabPattern, etc static variables of your Mapper
class.
You can also lower -Xmx to give other processes on the same node more
memory.
Cheers
On Sun, Jul 18, 2010 at 4:11 AM, Alan Miller <[EMAIL PROTECTED]>wrote:
> Thanks Ted,
>
> One or both suggestions remedied the problem. I'm not seeing that error
> anymore.
>
> In my Driver class I used config.set("mapred.child.java.opts", "-Xmx2048m
> -Xincgc");
> But I also altered my mapred-site.xml and set:
> io.file.buffer.size 65536
> io.sort.factor 32
> io.sort.mb 320
>
> For the 2nd suggestion. I'm a java novice, so I'm not sure if this actually
> does what you intended:
>
> I moved the 3 Patterns outside my map() and changed the logic to this:
>
> public class MyMapper extends Mapper<Object, Text, Text, Text) {
>
> Pattern tabPattern = Pattern.compile("\t");
> Pattern eolPattern = Pattern.compile("\n");
> Pattern spacePattern = Pattern.compile("(^[\\s]*)|([\\s]$)");
>
> public void map(Object key, Text value, Context context) {
> for (String line : eolPattern.split(value.toString()) {
> ....
> String[] values = tabPattern.split(line);
>
> for (int i=0; i,values.length; i++) {
> values[i] = spacePattern.matcher(values[i]).replaceAll("");
> }
> parser.setvals(values);
> ....
> }
> }
> }
>
> Alan
>
>
> On 07/17/2010 07:28 AM, Ted Yu wrote:
>
>> Have you tried increasing memory beyond 1GB for your map task ?
>>
>> I think you have noticed that both OOME came from Pattern.compile().
>>
>> Please take a look at
>>
http://www.docjar.com/html/api/java/lang/String.java.html>>
>> I would suggest pre-compiling the three patterns when setting up your
>> mapper
>> - basically write your own split() and replaceAll().
>>
>> I recently did something similar. You can find out the performance
>> improvement by customization -
>>
https://issues.apache.org/jira/browse/MAPREDUCE-1946>>
>> Cheers
>>
>> On Fri, Jul 16, 2010 at 6:06 AM, Some Body<[EMAIL PROTECTED]>
>> wrote:
>>
>>
>>
>>> Guess attachments are stripped.
>>>
>>> Here's the memory graph:
http://tinyurl.com/37g3hmu>>> Here's the VM Summary:
http://tinyurl.com/36wqzjq>>>
>>> Alan
>>>
>>>
>>>
>>
>>
>
>