Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Group By Concatenation


Copy link to this message
-
Re: Group By Concatenation
Collect_set() is built into hive. If you want a version that does not
de-duplicate look here.
https://github.com/edwardcapriolo/hive-collect

Caution both of these functions can produce out of memory if the
results are later then a mapper can store in memory.

On Thu, Nov 1, 2012 at 2:27 PM, Ratner, Alan S (IS) <[EMAIL PROTECTED]> wrote:
> Sorry to ask what is probably a very naïve Hive question but here goes:
>
>
>
> I have a table as follows:
>
> Col1   Col2
>
> K1       V1
>
> K1       V1
>
> K2       V1
>
> K3       V1
>
> K1       V2
>
> K1       V3
>
> K2       V2
>
>
>
> Now I have managed to SELECT Col1,COUNT(DISTINCT Col2) FROM … BY COL1; to
> obtain
>
> K1       3
>
> K2       2
>
> K3       1
>
>
>
> But what I want is a concatenated list of all the distinct Col2 values for
> each Col1 key.
>
> K1       V1       V2       V3
>
> K2       V1       V2
>
> K3       V1
>
>
>
> This is something absolutely trivial in MR but I cannot seem to find
> anything in Hive that will do this for me.  Do I have to write a UDF to
> accomplish this?
>
>
>
>
>
> Alan
>
>