Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Re: count of distinct FROM multiple columns


Copy link to this message
-
Re: count of distinct FROM multiple columns
A solution that comes to my mind is to use a union. Something like (untested):

select
   first,
   count(distinct second)
from
   (select
      col1 as first,
      col2 as second
   from
      dummy
   union all
   select
      col1 as first,
      col3 as second
   from
      dummy
   )t
group by
   first;

Mark
----- Original Message -----
From: "Jan Dolinár" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Friday, June 22, 2012 8:52:43 AM
Subject: Re: count of distinct FROM multiple columns
Hi
A quick solution that comes first to my mind is to join the columns you want to combine into an array and then use the explode UDTF:

SELECT col1, COUNT(distinct combined) FROM dummy LATERAL VIEW explode(array(col2, col3)) t AS combined GROUP BY col1;

Although I believe there might be simpler and/or better solutions.
Jan

On Fri, Jun 22, 2012 at 2:32 PM, MIS < [EMAIL PROTECTED] > wrote:
Hi All,

I have a table in Hive as below:

dummy {
col1 STRING,
col2 INT,
col3 INT
}

And in that there is some sample data as :

col1 col2 col3
ABC 4 5
XYZ 1 2
ABC 1 3
ABC 5 1
XYZ 3 1

What should be my query so as to get the below result:

ABC 4
XYZ 3

Basically I'm trying to get a count of distinct elements from col2 and col3 combined and group them against col1.

I tried with a query as:

select col1, count(distinct col2, col3) from dummy group by col1

But didn't get the expected output.

Can anybody point me in the correct direction and suggest a correct query.

Thanks.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB