Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Re: count of distinct FROM multiple columns


Copy link to this message
-
Re: count of distinct FROM multiple columns
Mark Grover 2012-06-22, 17:29
A solution that comes to my mind is to use a union. Something like (untested):

select
   first,
   count(distinct second)
from
   (select
      col1 as first,
      col2 as second
   from
      dummy
   union all
   select
      col1 as first,
      col3 as second
   from
      dummy
   )t
group by
   first;

Mark
----- Original Message -----
From: "Jan Dolinár" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Friday, June 22, 2012 8:52:43 AM
Subject: Re: count of distinct FROM multiple columns
Hi
A quick solution that comes first to my mind is to join the columns you want to combine into an array and then use the explode UDTF:

SELECT col1, COUNT(distinct combined) FROM dummy LATERAL VIEW explode(array(col2, col3)) t AS combined GROUP BY col1;

Although I believe there might be simpler and/or better solutions.
Jan

On Fri, Jun 22, 2012 at 2:32 PM, MIS < [EMAIL PROTECTED] > wrote:
Hi All,

I have a table in Hive as below:

dummy {
col1 STRING,
col2 INT,
col3 INT
}

And in that there is some sample data as :

col1 col2 col3
ABC 4 5
XYZ 1 2
ABC 1 3
ABC 5 1
XYZ 3 1

What should be my query so as to get the below result:

ABC 4
XYZ 3

Basically I'm trying to get a count of distinct elements from col2 and col3 combined and group them against col1.

I tried with a query as:

select col1, count(distinct col2, col3) from dummy group by col1

But didn't get the expected output.

Can anybody point me in the correct direction and suggest a correct query.

Thanks.