Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - easiest way to get loops in PIG?


Copy link to this message
-
Re: easiest way to get loops in PIG?
Yang 2012-06-21, 04:33
well, I tried to give a pure pig latin version without using Udf, but now
it seems too cumbersome without udf:
2 of the problems I can't solve is:
for a bag [ (id : int,  extra_info: chararray)], how to generate the tuple
with the smallest id.

another is : is there  a version of MIN() for chararray?

Thanks
yang

On Wed, Jun 20, 2012 at 7:22 PM, Yang <[EMAIL PROTECTED]> wrote:

> hehehe, thanks, I'll post my version slightly later :)
>
>
> On Wed, Jun 20, 2012 at 7:19 PM, Norbert Burger <[EMAIL PROTECTED]>wrote:
>
>> Yang -- have you seen Hortonworks' blogpost on this?
>>
>> http://hortonworks.com/blog/transitive-closure-in-apache-pig/
>>
>> Norbert
>>
>> On Wed, Jun 20, 2012 at 10:15 PM, Prashant Kommireddi
>> <[EMAIL PROTECTED]>wrote:
>>
>> > Would embedding Pig in java or other languages work?
>> >
>> > http://pig.apache.org/docs/r0.10.0/cont.html#embed-java
>> >
>> >
>> > On Jun 20, 2012, at 7:12 PM, Yang <[EMAIL PROTECTED]> wrote:
>> >
>> > > I agree that pig does not have loop probably for a good reason.
>> > >
>> > > but currently I need to write a code to find the transitive closures
>> of
>> > > many edges in a graph.
>> > > so I need to iterate a code snippet several times, so finally I can
>> find
>> > a
>> > > connected component of size 2^N
>> > >
>> > > right now I just copy-paste the snippet several times.
>> > >
>> > > I guess I could take out the snippet and make it into a separate pig
>> > > script, and load and store intermediate data
>> > > at the beginning and end. but loading data is kind of a waste.
>> > >
>> > > any suggestions?
>> > >
>> > > Thanks
>> > > Yang
>> >
>>
>
>