Mark 2011-04-01, 14:30
I created the following:
And I'm using it in the following way:
rows = LOAD 'foo' AS (user:chararray, item:long);
grouped = GROUP rows BY user;
final = GENERATE FLATTEN(com.mycompany.pig.udf.Foo(unique.item));
Does that look about right? Is there any particular reason why I need to
flatten at the end? When I try to output a simple tuple from the
EvalFunc it is always a tuple inside a tuple.
On 3/31/11 10:10 AM, Jonathan Coveney wrote:
> You definitely can do this with a UDF. You simply take the Tuples as input
> and then begin concatenating them together. Be wary of memory limitations
> for the intermediate as it gets large. It may be more practical to let the
> output be a tuple whose element sare the rows.
> then the input to your UDF will be a tuple whose first element is a bag, and
> then the output will be a tuple of all the elements. It is quite easy to
> write something that does this, take a look at the UDF documentation and ask
> if you need any help.
> 2011/3/31 Mark<[EMAIL PROTECTED]>
>> I have these "rows"
>> I believe the correct way to explain them would be each row/tuple is a bag
>> that contains tuples of size 1? Is that right?
>> Anyway, is there something native or UDF I can use to convert them to this
>> (199027860 199027860 149167529 203508790 198488630)
>> (174255619 201077556 199051606 198778302)
>> Maybe if I explain what we are trying to do it would help.
>> We have logs of users to product views in a tab delimited format.
>> We simply want product views grouped by user and outputed on 1 line.
>> 1234 4423
>> The above first line would be from the user foo, second bar and third baz.