Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Conversion


Copy link to this message
-
Re: Conversion
Mark 2011-04-01, 14:30
I created the following:

http://pastie.org/1743857

And I'm using it in the following way:

register 'target/pig-1.0-SNAPSHOT.jar'
rows = LOAD 'foo' AS (user:chararray, item:long);
grouped = GROUP rows BY user;
final = GENERATE FLATTEN(com.mycompany.pig.udf.Foo(unique.item));

Does that look about right? Is there any particular reason why I need to
flatten at the end? When I try to output a simple tuple from the
EvalFunc it is always a tuple inside a tuple.

Thanks
On 3/31/11 10:10 AM, Jonathan Coveney wrote:
> You definitely can do this with a UDF. You simply take the Tuples as input
> and then begin concatenating them together. Be wary of memory limitations
> for the intermediate as it gets large. It may be more practical to let the
> output be a tuple whose element sare the rows.
>
> (199027860,199027860,149167529,203508790,198488630)
>
> then the input to your UDF will be a tuple whose first element is a bag, and
> then the output will be a tuple of all the elements. It is quite easy to
> write something that does this, take a look at the UDF documentation and ask
> if you need any help.
>
> 2011/3/31 Mark<[EMAIL PROTECTED]>
>
>> I have these "rows"
>>
>> ({(155495400)})
>> ({(199027860),(199027860),(149167529),(203508790),(198488630)})
>> ({(174255619),(201077556),(199051606),(198778302)})
>>
>> I believe the correct way to explain them would be each row/tuple is a bag
>> that contains tuples of size 1? Is that right?
>>
>> Anyway, is there something native or UDF I can use to convert them to this
>> format?
>>
>> (155495400)
>> (199027860 199027860 149167529 203508790 198488630)
>> (174255619 201077556 199051606 198778302)
>>
>> Maybe if I explain what we are trying to do it would help.
>>
>> We have logs of users to product views in a tab delimited format.
>>
>> foo\t1234
>> bar\t1234
>> foo\t4423
>> baz\t5563
>>
>> We simply want product views grouped by user and outputed on 1 line.
>>
>> 1234 4423
>> 1234
>> 5563
>>
>> The above first line would be from the user foo, second bar and third baz.
>>
>> Thanks
>>