Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Nested SELECT DISTINCT runs out of memory


Copy link to this message
-
Nested SELECT DISTINCT runs out of memory
I have this query that consistently fails with out-of-memory errors. I know
it can be re-written without a nested subquery (using count distinct) and
then it runs fine.

Why does this query fail though? Is this is a known Hive issue? The
subquery returns 5M records.

SELECT x, COUNT(1) AS num
FROM (SELECT DISTINCT x, y) t
GROUP BY x;

I am using EMR Hive 0.8.1
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB