Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Group By Concatenation


Copy link to this message
-
Re: Group By Concatenation
Collect_set() is built into hive. If you want a version that does not
de-duplicate look here.
https://github.com/edwardcapriolo/hive-collect

Caution both of these functions can produce out of memory if the
results are later then a mapper can store in memory.

On Thu, Nov 1, 2012 at 2:27 PM, Ratner, Alan S (IS) <[EMAIL PROTECTED]> wrote:
> Sorry to ask what is probably a very naïve Hive question but here goes:
>
>
>
> I have a table as follows:
>
> Col1   Col2
>
> K1       V1
>
> K1       V1
>
> K2       V1
>
> K3       V1
>
> K1       V2
>
> K1       V3
>
> K2       V2
>
>
>
> Now I have managed to SELECT Col1,COUNT(DISTINCT Col2) FROM … BY COL1; to
> obtain
>
> K1       3
>
> K2       2
>
> K3       1
>
>
>
> But what I want is a concatenated list of all the distinct Col2 values for
> each Col1 key.
>
> K1       V1       V2       V3
>
> K2       V1       V2
>
> K3       V1
>
>
>
> This is something absolutely trivial in MR but I cannot seem to find
> anything in Hive that will do this for me.  Do I have to write a UDF to
> accomplish this?
>
>
>
>
>
> Alan
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB