Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - How to calculate delta in a column?


Copy link to this message
-
How to calculate delta in a column?
Eric Yang 2010-12-31, 06:12
Hi,

What is the most efficient method to calculate delta of columns?  Consider this:

(key1, 1, 2, 3)
(key1, 2, 4, 5)
(key2, 1, 2, 4)
(key1, 3, 6, 9)
(key2, 2, 4, 6)

The expected transformation output should look like this:

(key1, 1, 2, 2)
(key1, 1, 2, 4)
(key2, 1, 2, 2)

The idea is to group by f0, and compute f1 (current value) - f1
(previous value).  How to write this in pig?

if there is a underflow value, it should reset to 0, for example:

(key1, 1, 2, 3)
(key1, 0, 0, 0)
(key1, 2, 3, 4)

The output should be:

(key1, 0, 0, 0)
(key1, 2, 3, 4)

I haven't been able to find a solution from google.  Anyone?

regards,
Eric