Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Flatten a Bag on One Line?


Copy link to this message
-
Re: Flatten a Bag on One Line?
Pig doesn't have a piggybank for python udfs, but it makes sense to
create one.
Please attach your udf to a a new jira, and we can figure where to put it .

-Thejas
On 2/10/12 1:14 PM, Eli Finkelshteyn wrote:
> I was going to do this as a python udf, but haven't had a chance yet
> since other stuff I was working on took priority. As soon as I do write
> it, I'll be sure to upload it here. On a related note: is there a
> piggybank for python udfs I could contribute it to for posterity?
>
> Eli
>
> On 2/10/12 11:09 AM, pablomar wrote:
>> what about something like this?
>> (typing on the phone, forgive any mistake)
>>
>> public class Flat extends EvalFunc<Tuple>
>> {
>> public Tuple exec(Tuple input) throws IOException
>> {
>> try
>> {
>> List<Object> list = new LinkedList<Object>();
>> DataBag bag = (DataBag)input.get(0);
>> Iterator it = bag.iterator();
>> while(it.hasNext())
>> {
>> Tuple t = (Tuple)it.next();
>> if(t != null&& t.size()>0)
>> list.add(t.get(0));
>> }
>>
>> TupleFactory fac = TupleFactory.getInstance();
>> return fac.newTuple(list);
>> }
>> catch....
>>
>> On 2/10/12, Brendan Gill<[EMAIL PROTECTED]> wrote:
>>> Eli,
>>>
>>> I'm trying to do exactly this, but am pretty new to Pig. Any chance you
>>> would share what the UDF would look like? Then I can tailor it to our
>>> needs.
>>>
>>> Much appreciated if possible,
>>>
>>> Brendan
>>>
>>>
>>>
>>> On Thu, Feb 9, 2012 at 9:20 PM, Eli Finkelshteyn<[EMAIL PROTECTED]>
>>> wrote:
>>>
>>>> Thanks. Was hoping/assuming there was a built-in, but I guess udf it
>>>> is.
>>>>
>>>> Eli
>>>>
>>>>
>>>> On 2/9/12 2:14 PM, Yulia Tolskaya wrote:
>>>>
>>>>> I actually can't think of an easy way to do this without it becoming a
>>>>> cross product. You could just right a really simple udf that takes
>>>>> a bag
>>>>> and spits out just the members.
>>>>>
>>>>> Yulia
>>>>>
>>>>> On 2/9/12 1:26 PM, "Eli
>>>>> Finkelshteyn"<iefinkel@gmail.**com<[EMAIL PROTECTED]>>
>>>>> wrote:
>>>>>
>>>>> This is probably easy, but my PigLatin is rusty, and I don't seem
>>>>> to be
>>>>>> able to find an answer on Google. If I have a record of the form:
>>>>>>
>>>>>> 98812 3 {(48567859),(15996334),(**15897772)}
>>>>>>
>>>>>> How can I flatten that bag to leave all members on a single row, ie:
>>>>>>
>>>>>> 98812 3 48567859 15996334 15897772
>>>>>>
>>>>>> Cheers,
>>>>>> Eli
>>>>>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB