Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Nested SELECT DISTINCT runs out of memory


Copy link to this message
-
Nested SELECT DISTINCT runs out of memory
Igor Tatarinov 2012-07-26, 18:40
I have this query that consistently fails with out-of-memory errors. I know
it can be re-written without a nested subquery (using count distinct) and
then it runs fine.

Why does this query fail though? Is this is a known Hive issue? The
subquery returns 5M records.

SELECT x, COUNT(1) AS num
FROM (SELECT DISTINCT x, y) t
GROUP BY x;

I am using EMR Hive 0.8.1