|
Eli Finkelshteyn
2012-02-09, 18:26
Yulia Tolskaya
2012-02-09, 19:14
Eli Finkelshteyn
2012-02-09, 21:20
Brendan Gill
2012-02-10, 12:56
pablomar
2012-02-10, 16:09
Eli Finkelshteyn
2012-02-10, 21:14
Thejas Nair
2012-02-11, 00:07
Eli Finkelshteyn
2012-02-13, 06:36
|
-
Flatten a Bag on One Line?Eli Finkelshteyn 2012-02-09, 18:26
This is probably easy, but my PigLatin is rusty, and I don't seem to be
able to find an answer on Google. If I have a record of the form: 98812 3 {(48567859),(15996334),(15897772)} How can I flatten that bag to leave all members on a single row, ie: 98812 3 48567859 15996334 15897772 Cheers, Eli
-
Re: Flatten a Bag on One Line?Yulia Tolskaya 2012-02-09, 19:14
I actually can't think of an easy way to do this without it becoming a
cross product. You could just right a really simple udf that takes a bag and spits out just the members. Yulia On 2/9/12 1:26 PM, "Eli Finkelshteyn" <[EMAIL PROTECTED]> wrote: >This is probably easy, but my PigLatin is rusty, and I don't seem to be >able to find an answer on Google. If I have a record of the form: > > 98812 3 {(48567859),(15996334),(15897772)} > >How can I flatten that bag to leave all members on a single row, ie: > > 98812 3 48567859 15996334 15897772 > >Cheers, >Eli
-
Re: Flatten a Bag on One Line?Eli Finkelshteyn 2012-02-09, 21:20
Thanks. Was hoping/assuming there was a built-in, but I guess udf it is.
Eli On 2/9/12 2:14 PM, Yulia Tolskaya wrote: > I actually can't think of an easy way to do this without it becoming a > cross product. You could just right a really simple udf that takes a bag > and spits out just the members. > > Yulia > > On 2/9/12 1:26 PM, "Eli Finkelshteyn"<[EMAIL PROTECTED]> wrote: > >> This is probably easy, but my PigLatin is rusty, and I don't seem to be >> able to find an answer on Google. If I have a record of the form: >> >> 98812 3 {(48567859),(15996334),(15897772)} >> >> How can I flatten that bag to leave all members on a single row, ie: >> >> 98812 3 48567859 15996334 15897772 >> >> Cheers, >> Eli
-
Re: Flatten a Bag on One Line?Brendan Gill 2012-02-10, 12:56
Eli,
I'm trying to do exactly this, but am pretty new to Pig. Any chance you would share what the UDF would look like? Then I can tailor it to our needs. Much appreciated if possible, Brendan On Thu, Feb 9, 2012 at 9:20 PM, Eli Finkelshteyn <[EMAIL PROTECTED]> wrote: > Thanks. Was hoping/assuming there was a built-in, but I guess udf it is. > > Eli > > > On 2/9/12 2:14 PM, Yulia Tolskaya wrote: > >> I actually can't think of an easy way to do this without it becoming a >> cross product. You could just right a really simple udf that takes a bag >> and spits out just the members. >> >> Yulia >> >> On 2/9/12 1:26 PM, "Eli Finkelshteyn"<iefinkel@gmail.**com<[EMAIL PROTECTED]>> >> wrote: >> >> This is probably easy, but my PigLatin is rusty, and I don't seem to be >>> able to find an answer on Google. If I have a record of the form: >>> >>> 98812 3 {(48567859),(15996334),(**15897772)} >>> >>> How can I flatten that bag to leave all members on a single row, ie: >>> >>> 98812 3 48567859 15996334 15897772 >>> >>> Cheers, >>> Eli >>> >> >
-
Re: Flatten a Bag on One Line?pablomar 2012-02-10, 16:09
what about something like this?
(typing on the phone, forgive any mistake) public class Flat extends EvalFunc <Tuple> { public Tuple exec(Tuple input) throws IOException { try { List <Object> list = new LinkedList<Object>(); DataBag bag = (DataBag)input.get(0); Iterator it = bag.iterator(); while(it.hasNext()) { Tuple t = (Tuple)it.next(); if(t != null && t.size()>0) list.add(t.get(0)); } TupleFactory fac = TupleFactory.getInstance(); return fac.newTuple(list); } catch.... On 2/10/12, Brendan Gill <[EMAIL PROTECTED]> wrote: > Eli, > > I'm trying to do exactly this, but am pretty new to Pig. Any chance you > would share what the UDF would look like? Then I can tailor it to our > needs. > > Much appreciated if possible, > > Brendan > > > > On Thu, Feb 9, 2012 at 9:20 PM, Eli Finkelshteyn <[EMAIL PROTECTED]> wrote: > >> Thanks. Was hoping/assuming there was a built-in, but I guess udf it is. >> >> Eli >> >> >> On 2/9/12 2:14 PM, Yulia Tolskaya wrote: >> >>> I actually can't think of an easy way to do this without it becoming a >>> cross product. You could just right a really simple udf that takes a bag >>> and spits out just the members. >>> >>> Yulia >>> >>> On 2/9/12 1:26 PM, "Eli >>> Finkelshteyn"<iefinkel@gmail.**com<[EMAIL PROTECTED]>> >>> wrote: >>> >>> This is probably easy, but my PigLatin is rusty, and I don't seem to be >>>> able to find an answer on Google. If I have a record of the form: >>>> >>>> 98812 3 {(48567859),(15996334),(**15897772)} >>>> >>>> How can I flatten that bag to leave all members on a single row, ie: >>>> >>>> 98812 3 48567859 15996334 15897772 >>>> >>>> Cheers, >>>> Eli >>>> >>> >> >
-
Re: Flatten a Bag on One Line?Eli Finkelshteyn 2012-02-10, 21:14
I was going to do this as a python udf, but haven't had a chance yet
since other stuff I was working on took priority. As soon as I do write it, I'll be sure to upload it here. On a related note: is there a piggybank for python udfs I could contribute it to for posterity? Eli On 2/10/12 11:09 AM, pablomar wrote: > what about something like this? > (typing on the phone, forgive any mistake) > > public class Flat extends EvalFunc<Tuple> > { > public Tuple exec(Tuple input) throws IOException > { > try > { > List<Object> list = new LinkedList<Object>(); > DataBag bag = (DataBag)input.get(0); > Iterator it = bag.iterator(); > while(it.hasNext()) > { > Tuple t = (Tuple)it.next(); > if(t != null&& t.size()>0) > list.add(t.get(0)); > } > > TupleFactory fac = TupleFactory.getInstance(); > return fac.newTuple(list); > } > catch.... > > On 2/10/12, Brendan Gill<[EMAIL PROTECTED]> wrote: >> Eli, >> >> I'm trying to do exactly this, but am pretty new to Pig. Any chance you >> would share what the UDF would look like? Then I can tailor it to our >> needs. >> >> Much appreciated if possible, >> >> Brendan >> >> >> >> On Thu, Feb 9, 2012 at 9:20 PM, Eli Finkelshteyn<[EMAIL PROTECTED]> wrote: >> >>> Thanks. Was hoping/assuming there was a built-in, but I guess udf it is. >>> >>> Eli >>> >>> >>> On 2/9/12 2:14 PM, Yulia Tolskaya wrote: >>> >>>> I actually can't think of an easy way to do this without it becoming a >>>> cross product. You could just right a really simple udf that takes a bag >>>> and spits out just the members. >>>> >>>> Yulia >>>> >>>> On 2/9/12 1:26 PM, "Eli >>>> Finkelshteyn"<iefinkel@gmail.**com<[EMAIL PROTECTED]>> >>>> wrote: >>>> >>>> This is probably easy, but my PigLatin is rusty, and I don't seem to be >>>>> able to find an answer on Google. If I have a record of the form: >>>>> >>>>> 98812 3 {(48567859),(15996334),(**15897772)} >>>>> >>>>> How can I flatten that bag to leave all members on a single row, ie: >>>>> >>>>> 98812 3 48567859 15996334 15897772 >>>>> >>>>> Cheers, >>>>> Eli >>>>>
-
Re: Flatten a Bag on One Line?Thejas Nair 2012-02-11, 00:07
Pig doesn't have a piggybank for python udfs, but it makes sense to
create one. Please attach your udf to a a new jira, and we can figure where to put it . -Thejas On 2/10/12 1:14 PM, Eli Finkelshteyn wrote: > I was going to do this as a python udf, but haven't had a chance yet > since other stuff I was working on took priority. As soon as I do write > it, I'll be sure to upload it here. On a related note: is there a > piggybank for python udfs I could contribute it to for posterity? > > Eli > > On 2/10/12 11:09 AM, pablomar wrote: >> what about something like this? >> (typing on the phone, forgive any mistake) >> >> public class Flat extends EvalFunc<Tuple> >> { >> public Tuple exec(Tuple input) throws IOException >> { >> try >> { >> List<Object> list = new LinkedList<Object>(); >> DataBag bag = (DataBag)input.get(0); >> Iterator it = bag.iterator(); >> while(it.hasNext()) >> { >> Tuple t = (Tuple)it.next(); >> if(t != null&& t.size()>0) >> list.add(t.get(0)); >> } >> >> TupleFactory fac = TupleFactory.getInstance(); >> return fac.newTuple(list); >> } >> catch.... >> >> On 2/10/12, Brendan Gill<[EMAIL PROTECTED]> wrote: >>> Eli, >>> >>> I'm trying to do exactly this, but am pretty new to Pig. Any chance you >>> would share what the UDF would look like? Then I can tailor it to our >>> needs. >>> >>> Much appreciated if possible, >>> >>> Brendan >>> >>> >>> >>> On Thu, Feb 9, 2012 at 9:20 PM, Eli Finkelshteyn<[EMAIL PROTECTED]> >>> wrote: >>> >>>> Thanks. Was hoping/assuming there was a built-in, but I guess udf it >>>> is. >>>> >>>> Eli >>>> >>>> >>>> On 2/9/12 2:14 PM, Yulia Tolskaya wrote: >>>> >>>>> I actually can't think of an easy way to do this without it becoming a >>>>> cross product. You could just right a really simple udf that takes >>>>> a bag >>>>> and spits out just the members. >>>>> >>>>> Yulia >>>>> >>>>> On 2/9/12 1:26 PM, "Eli >>>>> Finkelshteyn"<iefinkel@gmail.**com<[EMAIL PROTECTED]>> >>>>> wrote: >>>>> >>>>> This is probably easy, but my PigLatin is rusty, and I don't seem >>>>> to be >>>>>> able to find an answer on Google. If I have a record of the form: >>>>>> >>>>>> 98812 3 {(48567859),(15996334),(**15897772)} >>>>>> >>>>>> How can I flatten that bag to leave all members on a single row, ie: >>>>>> >>>>>> 98812 3 48567859 15996334 15897772 >>>>>> >>>>>> Cheers, >>>>>> Eli >>>>>> >
-
Re: Flatten a Bag on One Line?Eli Finkelshteyn 2012-02-13, 06:36
Hey Folks,
Sorry it took so long to get back on this. The function I wound up using is really simple: @outputSchema("t:tuple()") def bagToTuple(bag): t = tuple([item[0] for item in bag]) return t You would use this in PIG to get what I wanted by just running that function on a bag and then flattening the result, for example: flattened_line = FOREACH line_with_bag GENERATE something, something_else, flatten(myfuncs.bagToTuple(some_bag)); Thejas, I created a JIRA for this here <https://issues.apache.org/jira/browse/PIG-2529>. This is the first one I've ever made, so please excuse me if I messed anything up in the format. Cheers, Eli On 2/10/12 7:07 PM, Thejas Nair wrote: > Pig doesn't have a piggybank for python udfs, but it makes sense to > create one. > Please attach your udf to a a new jira, and we can figure where to put > it . > > -Thejas > > > On 2/10/12 1:14 PM, Eli Finkelshteyn wrote: >> I was going to do this as a python udf, but haven't had a chance yet >> since other stuff I was working on took priority. As soon as I do write >> it, I'll be sure to upload it here. On a related note: is there a >> piggybank for python udfs I could contribute it to for posterity? >> >> Eli >> >> On 2/10/12 11:09 AM, pablomar wrote: >>> what about something like this? >>> (typing on the phone, forgive any mistake) >>> >>> public class Flat extends EvalFunc<Tuple> >>> { >>> public Tuple exec(Tuple input) throws IOException >>> { >>> try >>> { >>> List<Object> list = new LinkedList<Object>(); >>> DataBag bag = (DataBag)input.get(0); >>> Iterator it = bag.iterator(); >>> while(it.hasNext()) >>> { >>> Tuple t = (Tuple)it.next(); >>> if(t != null&& t.size()>0) >>> list.add(t.get(0)); >>> } >>> >>> TupleFactory fac = TupleFactory.getInstance(); >>> return fac.newTuple(list); >>> } >>> catch.... >>> >>> On 2/10/12, Brendan Gill<[EMAIL PROTECTED]> wrote: >>>> Eli, >>>> >>>> I'm trying to do exactly this, but am pretty new to Pig. Any chance >>>> you >>>> would share what the UDF would look like? Then I can tailor it to our >>>> needs. >>>> >>>> Much appreciated if possible, >>>> >>>> Brendan >>>> >>>> >>>> >>>> On Thu, Feb 9, 2012 at 9:20 PM, Eli Finkelshteyn<[EMAIL PROTECTED]> >>>> wrote: >>>> >>>>> Thanks. Was hoping/assuming there was a built-in, but I guess udf it >>>>> is. >>>>> >>>>> Eli >>>>> >>>>> >>>>> On 2/9/12 2:14 PM, Yulia Tolskaya wrote: >>>>> >>>>>> I actually can't think of an easy way to do this without it >>>>>> becoming a >>>>>> cross product. You could just right a really simple udf that takes >>>>>> a bag >>>>>> and spits out just the members. >>>>>> >>>>>> Yulia >>>>>> >>>>>> On 2/9/12 1:26 PM, "Eli >>>>>> Finkelshteyn"<iefinkel@gmail.**com<[EMAIL PROTECTED]>> >>>>>> wrote: >>>>>> >>>>>> This is probably easy, but my PigLatin is rusty, and I don't seem >>>>>> to be >>>>>>> able to find an answer on Google. If I have a record of the form: >>>>>>> >>>>>>> 98812 3 {(48567859),(15996334),(**15897772)} >>>>>>> >>>>>>> How can I flatten that bag to leave all members on a single row, >>>>>>> ie: >>>>>>> >>>>>>> 98812 3 48567859 15996334 15897772 >>>>>>> >>>>>>> Cheers, >>>>>>> Eli >>>>>>> >> > |