|
|
-
Re: Flatten a Bag on One Line?Eli Finkelshteyn 2012-02-13, 06:36
Hey Folks,
Sorry it took so long to get back on this. The function I wound up using is really simple: @outputSchema("t:tuple()") def bagToTuple(bag): t = tuple([item[0] for item in bag]) return t You would use this in PIG to get what I wanted by just running that function on a bag and then flattening the result, for example: flattened_line = FOREACH line_with_bag GENERATE something, something_else, flatten(myfuncs.bagToTuple(some_bag)); Thejas, I created a JIRA for this here <https://issues.apache.org/jira/browse/PIG-2529>. This is the first one I've ever made, so please excuse me if I messed anything up in the format. Cheers, Eli On 2/10/12 7:07 PM, Thejas Nair wrote: > Pig doesn't have a piggybank for python udfs, but it makes sense to > create one. > Please attach your udf to a a new jira, and we can figure where to put > it . > > -Thejas > > > On 2/10/12 1:14 PM, Eli Finkelshteyn wrote: >> I was going to do this as a python udf, but haven't had a chance yet >> since other stuff I was working on took priority. As soon as I do write >> it, I'll be sure to upload it here. On a related note: is there a >> piggybank for python udfs I could contribute it to for posterity? >> >> Eli >> >> On 2/10/12 11:09 AM, pablomar wrote: >>> what about something like this? >>> (typing on the phone, forgive any mistake) >>> >>> public class Flat extends EvalFunc<Tuple> >>> { >>> public Tuple exec(Tuple input) throws IOException >>> { >>> try >>> { >>> List<Object> list = new LinkedList<Object>(); >>> DataBag bag = (DataBag)input.get(0); >>> Iterator it = bag.iterator(); >>> while(it.hasNext()) >>> { >>> Tuple t = (Tuple)it.next(); >>> if(t != null&& t.size()>0) >>> list.add(t.get(0)); >>> } >>> >>> TupleFactory fac = TupleFactory.getInstance(); >>> return fac.newTuple(list); >>> } >>> catch.... >>> >>> On 2/10/12, Brendan Gill<[EMAIL PROTECTED]> wrote: >>>> Eli, >>>> >>>> I'm trying to do exactly this, but am pretty new to Pig. Any chance >>>> you >>>> would share what the UDF would look like? Then I can tailor it to our >>>> needs. >>>> >>>> Much appreciated if possible, >>>> >>>> Brendan >>>> >>>> >>>> >>>> On Thu, Feb 9, 2012 at 9:20 PM, Eli Finkelshteyn<[EMAIL PROTECTED]> >>>> wrote: >>>> >>>>> Thanks. Was hoping/assuming there was a built-in, but I guess udf it >>>>> is. >>>>> >>>>> Eli >>>>> >>>>> >>>>> On 2/9/12 2:14 PM, Yulia Tolskaya wrote: >>>>> >>>>>> I actually can't think of an easy way to do this without it >>>>>> becoming a >>>>>> cross product. You could just right a really simple udf that takes >>>>>> a bag >>>>>> and spits out just the members. >>>>>> >>>>>> Yulia >>>>>> >>>>>> On 2/9/12 1:26 PM, "Eli >>>>>> Finkelshteyn"<iefinkel@gmail.**com<[EMAIL PROTECTED]>> >>>>>> wrote: >>>>>> >>>>>> This is probably easy, but my PigLatin is rusty, and I don't seem >>>>>> to be >>>>>>> able to find an answer on Google. If I have a record of the form: >>>>>>> >>>>>>> 98812 3 {(48567859),(15996334),(**15897772)} >>>>>>> >>>>>>> How can I flatten that bag to leave all members on a single row, >>>>>>> ie: >>>>>>> >>>>>>> 98812 3 48567859 15996334 15897772 >>>>>>> >>>>>>> Cheers, >>>>>>> Eli >>>>>>> >> > |