Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> OOM/GC limit Error


Copy link to this message
-
OOM/GC limit Error
Hi all,

I have two tables:

tbl1: 81m rows
tbl2: 4m rows

tbl1 is partitioned on one column and tbl2 has none.

I'm attempting the following query:

SELECT
tbl1.col_pk,
tbl2.col1,
tbl2.col2,
SUM(tbl1.col4),
SUM(tbl1.col5),
SUM(tbl1.col4+col5)
FROM tbl2
JOIN tbl1 ON (tbl1.col_pk=tbl2.col_pk)
WHERE tbl1.partitioned_col IN ('2011','2012','2013')
GROUP BY
tbl1.col_pk,
tbl2.col1,
tbl2.col2;

I get this error:

OutOfMemoryError: GC overhead limit exceeded

So, I followed the suggestion at the end of the error output (Currently hive.map.aggr.hash.percentmemory is set to 0.5. Try setting it to a lower value. i.e 'set hive.map.aggr.hash.percentmemory = 0.25;') through several iterations, eventually getting my hive.map.aggr.hash.percentmemory setting down to something like .0165 and it still failed.

I did some searching and found some convoluted recommendations of what to try next. Some mentioned upping my heap size, some mentioned re-writing my query, etc. I upped my Hadoop maximum Java heap size to 4096mb ,re-ran, and got the same results.

Currently, some relevant settings are:

NameNode Heap Size: 4096mb
DataNode maximum Java heap size: 4096mb
Hadoop maximum Java heap size: 4096mb
Java Options for MapReduce tasks: 768mb

I have 16 map slots and 8 reduce slots available (5 node cluster, 4 data and one name)

Thanks in advance for the help,
Nick