Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> collect_set UDAF w/ Duplicates


Copy link to this message
-
collect_set UDAF w/ Duplicates
Hello,

 

What's the plan to support fully aggregated lists reading a table in
order? (see below)

 

I have a fairly complex (45 line) SELECT script in Hive with Joins,
Unions, etc. to which I have to add a list of aggregated values from a
field.

Data aside, I'm using collect_set to build a de-duped list of those
values. But I need the duplicates.

 

I've posted here on stack overflow (with a +50 bounty):

http://stackoverflow.com/questions/6445339/collect-set-in-hive-keep-dupl
icates

No hits.

 

... would I need to edit the original collect_set JAVA file and make my
own function? Or could I use a python script TRANSFORM()?

 

I'm aware of, but not entirely up to editing, the collect_set file:

https://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoo
p/hive/ql/udf/generic/GenericUDAFCollectSet.java

 

Thanks!

 

Travis Powell

 

Travis Powell / [EMAIL PROTECTED]

Tealeaf Technology  /  http://www.tealeaf.com

 

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB