-collect_set UDAF w/ Duplicates
Travis Powell 2011-06-29, 20:27
What's the plan to support fully aggregated lists reading a table in
order? (see below)
I have a fairly complex (45 line) SELECT script in Hive with Joins,
Unions, etc. to which I have to add a list of aggregated values from a
Data aside, I'm using collect_set to build a de-duped list of those
values. But I need the duplicates.
I've posted here on stack overflow (with a +50 bounty):
... would I need to edit the original collect_set JAVA file and make my
own function? Or could I use a python script TRANSFORM()?
I'm aware of, but not entirely up to editing, the collect_set file:
Travis Powell / [EMAIL PROTECTED]
Tealeaf Technology / http://www.tealeaf.com