Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> unexplode?


Also I have a colllect udf.
https://github.com/edwardcapriolo/hive-collect

Since collect sets removes duplicates.

On Thu, Aug 23, 2012 at 1:26 PM, Philip Tromans
<[EMAIL PROTECTED]> wrote:
> insert into originalTable
> select uniqueId, collect_set(whatever) from explodedTable group by uniqueId
>
> will probably do the trick.
>
> Phil.
>
> On 23 August 2012 17:45, Mike Fleming <[EMAIL PROTECTED]> wrote:
>> I see that hive has away to take a table and produce multiple rows.
>>
>> Is there a built in way to do the reverse?
>>
>> Say I have a table with a unique key and an array. I do this:
>>
>>> insert into explodedTable select uniqueId, explode(arrayOfThings) from
>>> originalTable
>>
>> Now I have a table with a row for each (uniqueId, element in arrayOfThings).
>>
>> Is there any way to take the contents of explodedTable and essentially
>> produce the original table, reconstructing the arrayOfThings for each
>> uniqueId?
>>
>> It seems, conceptually, that if I "cluster by uniqueId" then a reducer knows
>> that it will get all rows for each uniqueId bundled together, so it ought to
>> be fairly feasible to simply emit an unexploded row. However, I can't seem
>> to find a built-in way to do this.
>>
>> Mike
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB