Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> OOM/GC limit Error


Copy link to this message
-
OOM/GC limit Error
Hi all,

I have two tables:

tbl1: 81m rows
tbl2: 4m rows

tbl1 is partitioned on one column and tbl2 has none.

I'm attempting the following query:

SELECT
tbl1.col_pk,
tbl2.col1,
tbl2.col2,
SUM(tbl1.col4),
SUM(tbl1.col5),
SUM(tbl1.col4+col5)
FROM tbl2
JOIN tbl1 ON (tbl1.col_pk=tbl2.col_pk)
WHERE tbl1.partitioned_col IN ('2011','2012','2013')
GROUP BY
tbl1.col_pk,
tbl2.col1,
tbl2.col2;

I get this error:

OutOfMemoryError: GC overhead limit exceeded

So, I followed the suggestion at the end of the error output (Currently hive.map.aggr.hash.percentmemory is set to 0.5. Try setting it to a lower value. i.e 'set hive.map.aggr.hash.percentmemory = 0.25;') through several iterations, eventually getting my hive.map.aggr.hash.percentmemory setting down to something like .0165 and it still failed.

I did some searching and found some convoluted recommendations of what to try next. Some mentioned upping my heap size, some mentioned re-writing my query, etc. I upped my Hadoop maximum Java heap size to 4096mb ,re-ran, and got the same results.

Currently, some relevant settings are:

NameNode Heap Size: 4096mb
DataNode maximum Java heap size: 4096mb
Hadoop maximum Java heap size: 4096mb
Java Options for MapReduce tasks: 768mb

I have 16 map slots and 8 reduce slots available (5 node cluster, 4 data and one name)

Thanks in advance for the help,
Nick
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB