|
|
-
Nested SELECT DISTINCT runs out of memoryIgor Tatarinov 2012-07-26, 18:40
I have this query that consistently fails with out-of-memory errors. I know
it can be re-written without a nested subquery (using count distinct) and then it runs fine. Why does this query fail though? Is this is a known Hive issue? The subquery returns 5M records. SELECT x, COUNT(1) AS num FROM (SELECT DISTINCT x, y) t GROUP BY x; I am using EMR Hive 0.8.1 |