Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Combine multiple row values based upon a condition.


Copy link to this message
-
Re: Combine multiple row values based upon a condition.
Dean Wampler 2013-02-03, 14:07
If you really only need to consider adjacent rows, it might just be easier
to write a UDF or use streaming, where your code remembers the last record
seen and emits a new record if you want to do the join with the current
record.

On Sat, Feb 2, 2013 at 1:21 PM, Martijn van Leeuwen <[EMAIL PROTECTED]>wrote:

> Hi all,
>
> I new to Apache Hive and I am doing some test to see if it fits my needs,
> one of the questions I have if it is possible to "peek" for the next row in
> order to find out if the values should be combined. Let me explain by an
> example.
>
> Let say my data looks like this
>
> Id name offset
> 1 Jan 100
> 2 Janssen 104
> 3 Klaas 150
> 4 Jan 160
> 5 Janssen 164
>
> An my output to another table should be this
>
> Id fullname offsets
> 1 Jan Janssen [ 100, 160 ]
>
> I would like to combine the name values from two rows where the offset of
> the two rows are no more then 1 character apart.
>
> Is this type of data manipulation is possible and if it is could someone
> point me to the right direction hopefully with some explaination?
>
> Kind regards
> Martijn
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330