
+
drd_ 20091120, 18:18

Re: PIG bin/labeling relationDmitriy Ryaboy 20091121, 20:04
Unless you actually need the ordinal numbers, you can do it all in one step:
B = ORDER A by x PARALLEL 100; Store B into ...... This will create 100 ordered part files, with the first part file containing the first 100th of the data, the second  the next 100th, and so on. The fragments are approximate in size, so some may be slightly bigger than others, but for a big enough dataset, they should be roughly equal. D On Fri, Nov 20, 2009 at 1:18 PM, drd_ <[EMAIL PROTECTED]> wrote: > > I am using PIG and this is what I am trying to do: > > 1) Sort a relation A into B by a field x. The smallest value of x is first. > Just use SORT. > > 2) Label each tuple in B with a number denoting its order in the sorted > relation. So the first tuple would be labeled with a 1, the second tuple > with a 2, the third with a 3 and so on. Not certain how to do this. > > 3) Derive a relation C where each row is a bag of tuples. The first row > contains the first n1 tuples from relation B, the second row contains the > tuples from B labeled (n1 + 1) to n2 from, the third row contains the tuples > from B labeled (n2 + 1) to n3 and so on to n100. This step is simple (just > use filter) once we've labeled each tuple in B with a number. > > The question: how do I do step 2). > > thanks >  > View this message in context: http://old.nabble.com/PIGbinlabelingrelationtp26443615p26443615.html > Sent from the Hadoop coreuser mailing list archive at Nabble.com. > > 