Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Re: count of distinct FROM multiple columns


+
Jan Dolinár 2012-06-22, 12:52
+
Mark Grover 2012-06-22, 17:29
Copy link to this message
-
Re: count of distinct FROM multiple columns
I think your syntax is wrong. Hive should support multi-column
distinct and at that point counting should work.

You did:
select col1, count(distinct col2, col3) from dummy group by col1

I think the correct syntax is:
select col1, count(distinct (col2, col3)) from dummy group by col1

On Fri, Jun 22, 2012 at 1:29 PM, Mark Grover <[EMAIL PROTECTED]> wrote:
> A solution that comes to my mind is to use a union. Something like (untested):
>
> select
>   first,
>   count(distinct second)
> from
>   (select
>      col1 as first,
>      col2 as second
>   from
>      dummy
>   union all
>   select
>      col1 as first,
>      col3 as second
>   from
>      dummy
>   )t
> group by
>   first;
>
> Mark
> ----- Original Message -----
> From: "Jan Dolinár" <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Friday, June 22, 2012 8:52:43 AM
> Subject: Re: count of distinct FROM multiple columns
>
>
> Hi
>
>
> A quick solution that comes first to my mind is to join the columns you want to combine into an array and then use the explode UDTF:
>
> SELECT col1, COUNT(distinct combined) FROM dummy LATERAL VIEW explode(array(col2, col3)) t AS combined GROUP BY col1;
>
> Although I believe there might be simpler and/or better solutions.
>
>
> Jan
>
>
>
> On Fri, Jun 22, 2012 at 2:32 PM, MIS < [EMAIL PROTECTED] > wrote:
>
>
> Hi All,
>
> I have a table in Hive as below:
>
> dummy {
> col1 STRING,
> col2 INT,
> col3 INT
> }
>
> And in that there is some sample data as :
>
> col1 col2 col3
> ABC 4 5
> XYZ 1 2
> ABC 1 3
> ABC 5 1
> XYZ 3 1
>
> What should be my query so as to get the below result:
>
> ABC 4
> XYZ 3
>
> Basically I'm trying to get a count of distinct elements from col2 and col3 combined and group them against col1.
>
> I tried with a query as:
>
> select col1, count(distinct col2, col3) from dummy group by col1
>
> But didn't get the expected output.
>
> Can anybody point me in the correct direction and suggest a correct query.
>
> Thanks.
>
+
MIS 2012-06-23, 07:34
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB