Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Bag of tuples

Copy link to this message
RE: Bag of tuples
Hi Alan,
Thanks for your reply.
I am trying to understand how Pig processes these relations. As I mentioned, my UDF returns the result in the following format;

 {(id1,x,y,z), (id2, a, b, c), (id3,x,a)}  /* User 1 info */
 {(id10,x,y,z), (id9, a, b, c), (id1,x,a)} /* User 2 info */
 {(id8,x,y,z), (id4, a, b, c), (id2,x,a)} /* User 3 info */
 {(id6,x,y,z), (id6, a, b, c), (id9,x,a)} /* User 4 info */

B = foreach A { /* Each element in A is a bag. This will apply the following on each element within A that is each bag. */ Is this correct?
B1 = order A by $0; -- order on the id /*What does this A refer to? Does it refer to it to each Bag of relationship A ? I get the following error: expression is not a project expression:
/* rest of the code */

Thanks for your help.
> Subject: Re: Bag of tuples
> Date: Wed, 6 Nov 2013 09:36:04 -0800
> Do you mean you want to find the top 5 per input record?  Also, what is your ordering criteria?  Just sort by id?  Something like this should order all tuples in each bag by id and then produce the top 5.  My syntax may be a little off as I'm working offline and don't have the manual in front of me, but this should be the general idea.
> A = load 'yourinput' as (b:bag);
> B = foreach A {
> B1 = order A by $0; -- order on the id
> B2 = limit B1 5;
> generate flatten(B2);
> }
> Alan.
> On Nov 5, 2013, at 9:52 AM, Sameer Tilak wrote:
> > Hi Pig experts,
> > Sorry to post so many questions, I have one more question on doing some analytics on bag of tuples.
> >
> > My input has the following format:
> >
> > {(id1,x,y,z), (id2, a, b, c), (id3,x,a)}  /* User 1 info */
> > {(id10,x,y,z), (id9, a, b, c), (id1,x,a)} /* User 2 info */
> > {(id8,x,y,z), (id4, a, b, c), (id2,x,a)} /* User 3 info */
> > {(id6,x,y,z), (id6, a, b, c), (id9,x,a)} /* User 4 info */
> >
> > I can change my UDF to give more simple output. However, I want to find out if something like this can be done easily:
> > I would like to find out top 5 ids (field 1 in a tuple) among all the users. Note that each user has a bag and the first field of each tuple in that bag is id.
> >
> > How difficult will it be to filter based on fields of tuples and do analytics across the entire user base.
> >    
> --
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.