Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - OutOfMemoryError in MapReduce Job

Copy link to this message
Re: OutOfMemoryError in MapReduce Job
Asaf Mesika 2013-11-02, 14:37
I would try to compress this bit set.

On Nov 2, 2013, at 2:43 PM, John <[EMAIL PROTECTED]> wrote:

> Hi,
> thanks for your answer! I increase the "Map Task Maximum Heap Size" to 2gb
> and it seems to work. The OutOfMemoryEroror is gone. But the HBase Region
> server are now crashing all the time :-/ I try to store the bitvector
> (120mb in size) for some rows. This seems to be very memory intensive, the
> usedHeapMB increase very fast (up to 2gb). I'm  not sure if it is the
> reading or the writing task which causes this, but I thnk its the writing
> task. Any idea how to minimize the memory usage? My mapper looks like this:
> public class MyMapper extends TableMapper<ImmutableBytesWritable, Put> {
> private void storeBitvectorToHBase(
>        Put row = new Put(name);
>        row.setWriteToWAL(false);
>        row.add(cf,    Bytes.toBytes("columname"), toByteArray(bitvector));
>        ImmutableBytesWritable key = new ImmutableBytesWritable(
>                name);
>        context.write(key, row);
> }
> }
> kind regards
> 2013/11/1 Jean-Marc Spaggiari <[EMAIL PROTECTED]>
>> Ho John,
>> You might be better to ask this on the CDH mailing list since it's more
>> related to Cloudera Manager than HBase.
>> In the meantime, can you try to update the "Map Task Maximum Heap Size"
>> parameter too?
>> JM
>> 2013/11/1 John <[EMAIL PROTECTED]>
>>> Hi,
>>> I have a problem with the memory. My use case is the following: I've
>> crated
>>> a MapReduce-job and iterate in this over every row. If the row has more
>>> than for example 10k columns I will create a bloomfilter (a bitSet) for
>>> this row and store it in the hbase structure. This worked fine so far.
>>> BUT, now I try to store a BitSet with 1000000000 elements = ~120mb in
>> size.
>>> In every map()-function there exist 2 BitSet. If i try to execute the
>>> MR-job I got this error: http://pastebin.com/DxFYNuBG
>>> Obviously, the tasktracker does not have enougth memory. I try to adjust
>>> the configuration for the memory, but I'm not sure which is the right
>> one.
>>> I try to change the "MapReduce Child Java Maximum Heap Size" value from
>> 1GB
>>> to 2GB, but still got the same error.
>>> Which parameters do I have to adjust? BTW. I'm using CDH 4.4.0 with the
>>> Clouder Manager
>>> kind regards