Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Taking advantage of structure when doing UDFs and whatnot?


Copy link to this message
-
Re: Taking advantage of structure when doing UDFs and whatnot?
On Tue, Jan 04, 2011 at 02:10:52PM -0500, Jonathan Coveney wrote:
> I wasn't quite sure what title this, but hopefully it'll make sense. I have
> a couple of questions relating to a query that ultimately seeks to do this
>
> You have
>
> 1 10
> 1 12
> 1 15
> 1 16
> 2 1
> 2 2
> 2 3
> 2 6
>
> You want your output to be the difference between the successive numbers in
> the second column, ie
>
> 1 (10,0)
> 1 (12,2)
> 1 (15,3)
> 1 (15,1)
> 2 (1,0)
> 2 (2,1)
> 2 (3,1)
> 2 (6,3)
>
> Obviously, I need to write a udf to do this, but I have a couple questions..

If you were to have some sort of row counter, then I suspect that you
could do something along the lines of

  relCopy = relName;
  newRel = JOIN relName BY counter, relCopy BY counter-1;
  diff = FOREACH newRel GENERATE relName::stuff AS [...], relCopy::thing-relName::thing AS difference;

if you really want to avoid writing an extra UDF. But in the absence of such a
counter, yeah, I think a UDF would be necessary.

Cheers,
Kris

--
Kris Coward http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3