Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Re: count of distinct FROM multiple columns


Copy link to this message
-
Re: count of distinct FROM multiple columns
Hi

A quick solution that comes first to my mind is to join the columns you
want to combine into an array and then use the explode UDTF:

SELECT col1, COUNT(distinct combined) FROM dummy LATERAL VIEW
explode(array(col2, col3)) t AS combined GROUP BY col1;

Although I believe there might be simpler and/or better solutions.

Jan
On Fri, Jun 22, 2012 at 2:32 PM, MIS <[EMAIL PROTECTED]> wrote:

> Hi All,
>
> I have a table in Hive as below:
>
> dummy {
>     col1 STRING,
>     col2 INT,
>     col3 INT
> }
>
> And in that there is some sample data as :
>
> *col1  col2 col3 *
> ABC    4    5
> XYZ    1    2
> ABC    1    3
> ABC    5    1
> XYZ    3    1
>
> What should be my query so as to get the below result:
>
> *ABC 4
> XYZ 3*
>
> Basically I'm trying to get a count of distinct elements from *col2 and
> col3 combined* and group them against col1.
>
> I tried with a query as:
>
> select col1, count(distinct col2, col3) from dummy group by col1
>
> But didn't get the expected output.
>
> Can anybody point me in the correct direction and suggest a correct query.
>
> Thanks.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB