Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Taking advantage of structure when doing UDFs and whatnot?


Copy link to this message
-
Re: Taking advantage of structure when doing UDFs and whatnot?
On Tue, Jan 04, 2011 at 02:10:52PM -0500, Jonathan Coveney wrote:
> I wasn't quite sure what title this, but hopefully it'll make sense. I have
> a couple of questions relating to a query that ultimately seeks to do this
>
> You have
>
> 1 10
> 1 12
> 1 15
> 1 16
> 2 1
> 2 2
> 2 3
> 2 6
>
> You want your output to be the difference between the successive numbers in
> the second column, ie
>
> 1 (10,0)
> 1 (12,2)
> 1 (15,3)
> 1 (15,1)
> 2 (1,0)
> 2 (2,1)
> 2 (3,1)
> 2 (6,3)
>
> Obviously, I need to write a udf to do this, but I have a couple questions..

If you were to have some sort of row counter, then I suspect that you
could do something along the lines of

  relCopy = relName;
  newRel = JOIN relName BY counter, relCopy BY counter-1;
  diff = FOREACH newRel GENERATE relName::stuff AS [...], relCopy::thing-relName::thing AS difference;

if you really want to avoid writing an extra UDF. But in the absence of such a
counter, yeah, I think a UDF would be necessary.

Cheers,
Kris

--
Kris Coward http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB