Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Flatten a Bag on One Line?


Copy link to this message
-
Re: Flatten a Bag on One Line?
Eli Finkelshteyn 2012-02-10, 21:14
I was going to do this as a python udf, but haven't had a chance yet
since other stuff I was working on took priority. As soon as I do write
it, I'll be sure to upload it here. On a related note: is there a
piggybank for python udfs I could contribute it to for posterity?

Eli

On 2/10/12 11:09 AM, pablomar wrote:
> what about something like this?
> (typing on the phone, forgive any mistake)
>
> public class Flat extends EvalFunc<Tuple>
> {
> public Tuple exec(Tuple input) throws IOException
> {
> try
> {
> List<Object>  list = new LinkedList<Object>();
> DataBag bag = (DataBag)input.get(0);
> Iterator it = bag.iterator();
> while(it.hasNext())
> {
> Tuple t = (Tuple)it.next();
> if(t != null&&  t.size()>0)
> list.add(t.get(0));
> }
>
> TupleFactory fac = TupleFactory.getInstance();
> return fac.newTuple(list);
> }
> catch....
>
> On 2/10/12, Brendan Gill<[EMAIL PROTECTED]>  wrote:
>> Eli,
>>
>> I'm trying to do exactly this, but am pretty new to Pig.  Any chance you
>> would share what the UDF would look like?  Then I can tailor it to our
>> needs.
>>
>> Much appreciated if possible,
>>
>> Brendan
>>
>>
>>
>> On Thu, Feb 9, 2012 at 9:20 PM, Eli Finkelshteyn<[EMAIL PROTECTED]>  wrote:
>>
>>> Thanks. Was hoping/assuming there was a built-in, but I guess udf it is.
>>>
>>> Eli
>>>
>>>
>>> On 2/9/12 2:14 PM, Yulia Tolskaya wrote:
>>>
>>>> I actually can't think of an easy way to do this without it becoming a
>>>> cross product. You could just right a really simple udf that takes a bag
>>>> and spits out just the members.
>>>>
>>>> Yulia
>>>>
>>>> On 2/9/12 1:26 PM, "Eli
>>>> Finkelshteyn"<iefinkel@gmail.**com<[EMAIL PROTECTED]>>
>>>>   wrote:
>>>>
>>>>   This is probably easy, but my PigLatin is rusty, and I don't seem to be
>>>>> able to find an answer on Google. If I have a record of the form:
>>>>>
>>>>>      98812   3       {(48567859),(15996334),(**15897772)}
>>>>>
>>>>> How can I flatten that bag to leave all members on a single row, ie:
>>>>>
>>>>>      98812    3    48567859    15996334    15897772
>>>>>
>>>>> Cheers,
>>>>> Eli
>>>>>