Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Flatten a Bag on One Line?


Copy link to this message
-
Re: Flatten a Bag on One Line?
I was going to do this as a python udf, but haven't had a chance yet
since other stuff I was working on took priority. As soon as I do write
it, I'll be sure to upload it here. On a related note: is there a
piggybank for python udfs I could contribute it to for posterity?

Eli

On 2/10/12 11:09 AM, pablomar wrote:
> what about something like this?
> (typing on the phone, forgive any mistake)
>
> public class Flat extends EvalFunc<Tuple>
> {
> public Tuple exec(Tuple input) throws IOException
> {
> try
> {
> List<Object>  list = new LinkedList<Object>();
> DataBag bag = (DataBag)input.get(0);
> Iterator it = bag.iterator();
> while(it.hasNext())
> {
> Tuple t = (Tuple)it.next();
> if(t != null&&  t.size()>0)
> list.add(t.get(0));
> }
>
> TupleFactory fac = TupleFactory.getInstance();
> return fac.newTuple(list);
> }
> catch....
>
> On 2/10/12, Brendan Gill<[EMAIL PROTECTED]>  wrote:
>> Eli,
>>
>> I'm trying to do exactly this, but am pretty new to Pig.  Any chance you
>> would share what the UDF would look like?  Then I can tailor it to our
>> needs.
>>
>> Much appreciated if possible,
>>
>> Brendan
>>
>>
>>
>> On Thu, Feb 9, 2012 at 9:20 PM, Eli Finkelshteyn<[EMAIL PROTECTED]>  wrote:
>>
>>> Thanks. Was hoping/assuming there was a built-in, but I guess udf it is.
>>>
>>> Eli
>>>
>>>
>>> On 2/9/12 2:14 PM, Yulia Tolskaya wrote:
>>>
>>>> I actually can't think of an easy way to do this without it becoming a
>>>> cross product. You could just right a really simple udf that takes a bag
>>>> and spits out just the members.
>>>>
>>>> Yulia
>>>>
>>>> On 2/9/12 1:26 PM, "Eli
>>>> Finkelshteyn"<iefinkel@gmail.**com<[EMAIL PROTECTED]>>
>>>>   wrote:
>>>>
>>>>   This is probably easy, but my PigLatin is rusty, and I don't seem to be
>>>>> able to find an answer on Google. If I have a record of the form:
>>>>>
>>>>>      98812   3       {(48567859),(15996334),(**15897772)}
>>>>>
>>>>> How can I flatten that bag to leave all members on a single row, ie:
>>>>>
>>>>>      98812    3    48567859    15996334    15897772
>>>>>
>>>>> Cheers,
>>>>> Eli
>>>>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB