Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Difference in number of row observstions from distinct and group by


+
Mayank Bansal 2013-11-21, 15:13
+
Thejas Nair 2013-11-21, 20:18
+
Mayank Bansal 2013-11-25, 08:06
Copy link to this message
-
Re: Difference in number of row observstions from distinct and group by
On 25 Nov 2013, at 9:06, Mayank Bansal wrote:

> Hi,
>
> I was also thinking that this might be the case. For that reason I ran
> this query
>
> Select * from (select col1,col2,col3,count(*)  as val from table_name
> group by col1,col2,col3)a where a.val>1 ;
>
> The output that I receive from this query is blank, then I ended up
> doing count(*) and I got the same number of rows as originally in the
> table. Please help me figure this out.

Instead of going circles, I would put the 2 result sets in 2 tables
(with a concatenated PK made of your 3 columns, with some separator like
'-'), and do a left outer join of table 1 on table 2.

you'd be able to identify quickly what went wrong. Sort the result so
you get unlikely dupes, and all. Just trial and error until you nail it.

David
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB