|
|
-
Re: Flatten a Bag on One Line?Thejas Nair 2012-02-11, 00:07
Pig doesn't have a piggybank for python udfs, but it makes sense to
create one. Please attach your udf to a a new jira, and we can figure where to put it . -Thejas On 2/10/12 1:14 PM, Eli Finkelshteyn wrote: > I was going to do this as a python udf, but haven't had a chance yet > since other stuff I was working on took priority. As soon as I do write > it, I'll be sure to upload it here. On a related note: is there a > piggybank for python udfs I could contribute it to for posterity? > > Eli > > On 2/10/12 11:09 AM, pablomar wrote: >> what about something like this? >> (typing on the phone, forgive any mistake) >> >> public class Flat extends EvalFunc<Tuple> >> { >> public Tuple exec(Tuple input) throws IOException >> { >> try >> { >> List<Object> list = new LinkedList<Object>(); >> DataBag bag = (DataBag)input.get(0); >> Iterator it = bag.iterator(); >> while(it.hasNext()) >> { >> Tuple t = (Tuple)it.next(); >> if(t != null&& t.size()>0) >> list.add(t.get(0)); >> } >> >> TupleFactory fac = TupleFactory.getInstance(); >> return fac.newTuple(list); >> } >> catch.... >> >> On 2/10/12, Brendan Gill<[EMAIL PROTECTED]> wrote: >>> Eli, >>> >>> I'm trying to do exactly this, but am pretty new to Pig. Any chance you >>> would share what the UDF would look like? Then I can tailor it to our >>> needs. >>> >>> Much appreciated if possible, >>> >>> Brendan >>> >>> >>> >>> On Thu, Feb 9, 2012 at 9:20 PM, Eli Finkelshteyn<[EMAIL PROTECTED]> >>> wrote: >>> >>>> Thanks. Was hoping/assuming there was a built-in, but I guess udf it >>>> is. >>>> >>>> Eli >>>> >>>> >>>> On 2/9/12 2:14 PM, Yulia Tolskaya wrote: >>>> >>>>> I actually can't think of an easy way to do this without it becoming a >>>>> cross product. You could just right a really simple udf that takes >>>>> a bag >>>>> and spits out just the members. >>>>> >>>>> Yulia >>>>> >>>>> On 2/9/12 1:26 PM, "Eli >>>>> Finkelshteyn"<iefinkel@gmail.**com<[EMAIL PROTECTED]>> >>>>> wrote: >>>>> >>>>> This is probably easy, but my PigLatin is rusty, and I don't seem >>>>> to be >>>>>> able to find an answer on Google. If I have a record of the form: >>>>>> >>>>>> 98812 3 {(48567859),(15996334),(**15897772)} >>>>>> >>>>>> How can I flatten that bag to leave all members on a single row, ie: >>>>>> >>>>>> 98812 3 48567859 15996334 15897772 >>>>>> >>>>>> Cheers, >>>>>> Eli >>>>>> > |