Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Re: count of distinct FROM multiple columns


Copy link to this message
-
Re: count of distinct FROM multiple columns
Hi

A quick solution that comes first to my mind is to join the columns you
want to combine into an array and then use the explode UDTF:

SELECT col1, COUNT(distinct combined) FROM dummy LATERAL VIEW
explode(array(col2, col3)) t AS combined GROUP BY col1;

Although I believe there might be simpler and/or better solutions.

Jan
On Fri, Jun 22, 2012 at 2:32 PM, MIS <[EMAIL PROTECTED]> wrote:

> Hi All,
>
> I have a table in Hive as below:
>
> dummy {
>     col1 STRING,
>     col2 INT,
>     col3 INT
> }
>
> And in that there is some sample data as :
>
> *col1  col2 col3 *
> ABC    4    5
> XYZ    1    2
> ABC    1    3
> ABC    5    1
> XYZ    3    1
>
> What should be my query so as to get the below result:
>
> *ABC 4
> XYZ 3*
>
> Basically I'm trying to get a count of distinct elements from *col2 and
> col3 combined* and group them against col1.
>
> I tried with a query as:
>
> select col1, count(distinct col2, col3) from dummy group by col1
>
> But didn't get the expected output.
>
> Can anybody point me in the correct direction and suggest a correct query.
>
> Thanks.
>