Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Conversion


Copy link to this message
-
Conversion
Mark 2011-03-31, 15:49
I have these "rows"

({(155495400)})
({(199027860),(199027860),(149167529),(203508790),(198488630)})
({(174255619),(201077556),(199051606),(198778302)})

I believe the correct way to explain them would be each row/tuple is a
bag that contains tuples of size 1? Is that right?

Anyway, is there something native or UDF I can use to convert them to
this format?

(155495400)
(199027860 199027860 149167529 203508790 198488630)
(174255619 201077556 199051606 198778302)

Maybe if I explain what we are trying to do it would help.

We have logs of users to product views in a tab delimited format.

foo\t1234
bar\t1234
foo\t4423
baz\t5563

We simply want product views grouped by user and outputed on 1 line.

1234 4423
1234
5563

The above first line would be from the user foo, second bar and third baz.

Thanks