Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Re: count of distinct FROM multiple columns


Copy link to this message
-
Re: count of distinct FROM multiple columns
Edward Capriolo 2012-06-22, 17:42
I think your syntax is wrong. Hive should support multi-column
distinct and at that point counting should work.

You did:
select col1, count(distinct col2, col3) from dummy group by col1

I think the correct syntax is:
select col1, count(distinct (col2, col3)) from dummy group by col1

On Fri, Jun 22, 2012 at 1:29 PM, Mark Grover <[EMAIL PROTECTED]> wrote:
> A solution that comes to my mind is to use a union. Something like (untested):
>
> select
>   first,
>   count(distinct second)
> from
>   (select
>      col1 as first,
>      col2 as second
>   from
>      dummy
>   union all
>   select
>      col1 as first,
>      col3 as second
>   from
>      dummy
>   )t
> group by
>   first;
>
> Mark
> ----- Original Message -----
> From: "Jan Dolinár" <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Friday, June 22, 2012 8:52:43 AM
> Subject: Re: count of distinct FROM multiple columns
>
>
> Hi
>
>
> A quick solution that comes first to my mind is to join the columns you want to combine into an array and then use the explode UDTF:
>
> SELECT col1, COUNT(distinct combined) FROM dummy LATERAL VIEW explode(array(col2, col3)) t AS combined GROUP BY col1;
>
> Although I believe there might be simpler and/or better solutions.
>
>
> Jan
>
>
>
> On Fri, Jun 22, 2012 at 2:32 PM, MIS < [EMAIL PROTECTED] > wrote:
>
>
> Hi All,
>
> I have a table in Hive as below:
>
> dummy {
> col1 STRING,
> col2 INT,
> col3 INT
> }
>
> And in that there is some sample data as :
>
> col1 col2 col3
> ABC 4 5
> XYZ 1 2
> ABC 1 3
> ABC 5 1
> XYZ 3 1
>
> What should be my query so as to get the below result:
>
> ABC 4
> XYZ 3
>
> Basically I'm trying to get a count of distinct elements from col2 and col3 combined and group them against col1.
>
> I tried with a query as:
>
> select col1, count(distinct col2, col3) from dummy group by col1
>
> But didn't get the expected output.
>
> Can anybody point me in the correct direction and suggest a correct query.
>
> Thanks.
>