Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> OutOfMemoryError in MapReduce Job


Copy link to this message
-
Re: OutOfMemoryError in MapReduce Job
@Ted: okay, thanks for the information

@Asaf: It seems to work if I compress the bytes by my self. I use snappy
for that ( https://code.google.com/p/snappy/ ). The 120mb BitSet is
compressed to a 5mb  byte array. So  far the hbase server did not crashed.
Thanks!

kind regards
2013/11/2 Ted Yu <[EMAIL PROTECTED]>

> Compression happens on server.
> See src/main/java/org/apache/hadoop/hbase/io/hfile/Compression.java (0.94)
>
> In 0.96 and beyond, see http://hbase.apache.org/book.html#rpc.configs
>
> Cheers
>
> On Sat, Nov 2, 2013 at 9:46 AM, John <[EMAIL PROTECTED]> wrote:
>
> > You mean I should use the BitSet, transform it into bytes and then
> compress
> > it by my own in the map-function? Hmmm ... I could try it. What is the
> best
> > way to compress it in java?
> >
> > BTW. I'm not sure how exactly the hbase compression works. As I
> mentioned I
> > have allready enabled the LZO compression for the columnfamily. The
> > question is, where the bytes are compressed? Directly in the map-function
> > (If no, is it possible to compress it there with lzo?!) or in the region
> > server?
> >
> > kind regards
> >
> >
> > 2013/11/2 Asaf Mesika <[EMAIL PROTECTED]>
> >
> > > If mean, if you take all those bytes if the bit set and zip them,
> > wouldn't
> > > you reduce it significantly? Less traffic on the wire, memory in HBase,
> > > etc.
> > >
> > > On Saturday, November 2, 2013, John wrote:
> > >
> > > > I already use LZO compression in HBase. Or do you mean a compressed
> > Java
> > > > object? Do you know an implementation?
> > > >
> > > > kind regards
> > > >
> > > >
> > > > 2013/11/2 Asaf Mesika <[EMAIL PROTECTED] <javascript:;>>
> > > >
> > > > > I would try to compress this bit set.
> > > > >
> > > > > On Nov 2, 2013, at 2:43 PM, John <[EMAIL PROTECTED]
> > > <javascript:;>>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > thanks for your answer! I increase the "Map Task Maximum Heap
> Size"
> > > to
> > > > > 2gb
> > > > > > and it seems to work. The OutOfMemoryEroror is gone. But the
> HBase
> > > > Region
> > > > > > server are now crashing all the time :-/ I try to store the
> > bitvector
> > > > > > (120mb in size) for some rows. This seems to be very memory
> > > intensive,
> > > > > the
> > > > > > usedHeapMB increase very fast (up to 2gb). I'm  not sure if it is
> > the
> > > > > > reading or the writing task which causes this, but I thnk its the
> > > > writing
> > > > > > task. Any idea how to minimize the memory usage? My mapper looks
> > like
> > > > > this:
> > > > > >
> > > > > > public class MyMapper extends TableMapper<ImmutableBytesWritable,
> > > Put>
> > > > {
> > > > > >
> > > > > > private void storeBitvectorToHBase(
> > > > > >        Put row = new Put(name);
> > > > > >        row.setWriteToWAL(false);
> > > > > >        row.add(cf,    Bytes.toBytes("columname"),
> > > > > toByteArray(bitvector));
> > > > > >        ImmutableBytesWritable key = new ImmutableBytesWritable(
> > > > > >                name);
> > > > > >        context.write(key, row);
> > > > > > }
> > > > > > }
> > > > > >
> > > > > >
> > > > > > kind regards
> > > > > >
> > > > > >
> > > > > > 2013/11/1 Jean-Marc Spaggiari <[EMAIL PROTECTED]
> > <javascript:;>>
> > > > > >
> > > > > >> Ho John,
> > > > > >>
> > > > > >> You might be better to ask this on the CDH mailing list since
> it's
> > > > more
> > > > > >> related to Cloudera Manager than HBase.
> > > > > >>
> > > > > >> In the meantime, can you try to update the "Map Task Maximum
> Heap
> > > > Size"
> > > > > >> parameter too?
> > > > > >>
> > > > > >> JM
> > > > > >>
> > > > > >>
> > > > > >> 2013/11/1 John <[EMAIL PROTECTED] <javascript:;>>
> > > > > >>
> > > > > >>> Hi,
> > > > > >>>
> > > > > >>> I have a problem with the memory. My use case is the following:
> > > I've
> > > > > >> crated
> > > > > >>> a MapReduce-job and iterate in this over every row. If the row
> > has
> > > > more
> > > > > >>> than for example 10k columns I will create a bloomfilter (a
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB