Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Flatten a Bag on One Line?


Copy link to this message
-
Re: Flatten a Bag on One Line?
pablomar 2012-02-10, 16:09
what about something like this?
(typing on the phone, forgive any mistake)

public class Flat extends EvalFunc <Tuple>
{
public Tuple exec(Tuple input) throws IOException
{
try
{
List <Object> list = new LinkedList<Object>();
DataBag bag = (DataBag)input.get(0);
Iterator it = bag.iterator();
while(it.hasNext())
{
Tuple t = (Tuple)it.next();
if(t != null && t.size()>0)
list.add(t.get(0));
}

TupleFactory fac = TupleFactory.getInstance();
return fac.newTuple(list);
}
catch....

On 2/10/12, Brendan Gill <[EMAIL PROTECTED]> wrote:
> Eli,
>
> I'm trying to do exactly this, but am pretty new to Pig.  Any chance you
> would share what the UDF would look like?  Then I can tailor it to our
> needs.
>
> Much appreciated if possible,
>
> Brendan
>
>
>
> On Thu, Feb 9, 2012 at 9:20 PM, Eli Finkelshteyn <[EMAIL PROTECTED]> wrote:
>
>> Thanks. Was hoping/assuming there was a built-in, but I guess udf it is.
>>
>> Eli
>>
>>
>> On 2/9/12 2:14 PM, Yulia Tolskaya wrote:
>>
>>> I actually can't think of an easy way to do this without it becoming a
>>> cross product. You could just right a really simple udf that takes a bag
>>> and spits out just the members.
>>>
>>> Yulia
>>>
>>> On 2/9/12 1:26 PM, "Eli
>>> Finkelshteyn"<iefinkel@gmail.**com<[EMAIL PROTECTED]>>
>>>  wrote:
>>>
>>>  This is probably easy, but my PigLatin is rusty, and I don't seem to be
>>>> able to find an answer on Google. If I have a record of the form:
>>>>
>>>>     98812   3       {(48567859),(15996334),(**15897772)}
>>>>
>>>> How can I flatten that bag to leave all members on a single row, ie:
>>>>
>>>>     98812    3    48567859    15996334    15897772
>>>>
>>>> Cheers,
>>>> Eli
>>>>
>>>
>>
>