Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Difference in number of row observstions from distinct and group by


Copy link to this message
-
Re: Difference in number of row observstions from distinct and group by
David Morel 2013-11-25, 13:46
On 25 Nov 2013, at 9:06, Mayank Bansal wrote:

> Hi,
>
> I was also thinking that this might be the case. For that reason I ran
> this query
>
> Select * from (select col1,col2,col3,count(*)  as val from table_name
> group by col1,col2,col3)a where a.val>1 ;
>
> The output that I receive from this query is blank, then I ended up
> doing count(*) and I got the same number of rows as originally in the
> table. Please help me figure this out.

Instead of going circles, I would put the 2 result sets in 2 tables
(with a concatenated PK made of your 3 columns, with some separator like
'-'), and do a left outer join of table 1 on table 2.

you'd be able to identify quickly what went wrong. Sort the result so
you get unlikely dupes, and all. Just trial and error until you nail it.

David