Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - OutOfMemoryError in MapReduce Job


Copy link to this message
-
Re: OutOfMemoryError in MapReduce Job
John 2013-11-02, 15:29
I already use LZO compression in HBase. Or do you mean a compressed Java
object? Do you know an implementation?

kind regards
2013/11/2 Asaf Mesika <[EMAIL PROTECTED]>

> I would try to compress this bit set.
>
> On Nov 2, 2013, at 2:43 PM, John <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > thanks for your answer! I increase the "Map Task Maximum Heap Size" to
> 2gb
> > and it seems to work. The OutOfMemoryEroror is gone. But the HBase Region
> > server are now crashing all the time :-/ I try to store the bitvector
> > (120mb in size) for some rows. This seems to be very memory intensive,
> the
> > usedHeapMB increase very fast (up to 2gb). I'm  not sure if it is the
> > reading or the writing task which causes this, but I thnk its the writing
> > task. Any idea how to minimize the memory usage? My mapper looks like
> this:
> >
> > public class MyMapper extends TableMapper<ImmutableBytesWritable, Put> {
> >
> > private void storeBitvectorToHBase(
> >        Put row = new Put(name);
> >        row.setWriteToWAL(false);
> >        row.add(cf,    Bytes.toBytes("columname"),
> toByteArray(bitvector));
> >        ImmutableBytesWritable key = new ImmutableBytesWritable(
> >                name);
> >        context.write(key, row);
> > }
> > }
> >
> >
> > kind regards
> >
> >
> > 2013/11/1 Jean-Marc Spaggiari <[EMAIL PROTECTED]>
> >
> >> Ho John,
> >>
> >> You might be better to ask this on the CDH mailing list since it's more
> >> related to Cloudera Manager than HBase.
> >>
> >> In the meantime, can you try to update the "Map Task Maximum Heap Size"
> >> parameter too?
> >>
> >> JM
> >>
> >>
> >> 2013/11/1 John <[EMAIL PROTECTED]>
> >>
> >>> Hi,
> >>>
> >>> I have a problem with the memory. My use case is the following: I've
> >> crated
> >>> a MapReduce-job and iterate in this over every row. If the row has more
> >>> than for example 10k columns I will create a bloomfilter (a bitSet) for
> >>> this row and store it in the hbase structure. This worked fine so far.
> >>>
> >>> BUT, now I try to store a BitSet with 1000000000 elements = ~120mb in
> >> size.
> >>> In every map()-function there exist 2 BitSet. If i try to execute the
> >>> MR-job I got this error: http://pastebin.com/DxFYNuBG
> >>>
> >>> Obviously, the tasktracker does not have enougth memory. I try to
> adjust
> >>> the configuration for the memory, but I'm not sure which is the right
> >> one.
> >>> I try to change the "MapReduce Child Java Maximum Heap Size" value from
> >> 1GB
> >>> to 2GB, but still got the same error.
> >>>
> >>> Which parameters do I have to adjust? BTW. I'm using CDH 4.4.0 with the
> >>> Clouder Manager
> >>>
> >>> kind regards
> >>>
> >>
>
>