Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Conversion


Copy link to this message
-
Conversion
I have these "rows"

({(155495400)})
({(199027860),(199027860),(149167529),(203508790),(198488630)})
({(174255619),(201077556),(199051606),(198778302)})

I believe the correct way to explain them would be each row/tuple is a
bag that contains tuples of size 1? Is that right?

Anyway, is there something native or UDF I can use to convert them to
this format?

(155495400)
(199027860 199027860 149167529 203508790 198488630)
(174255619 201077556 199051606 198778302)

Maybe if I explain what we are trying to do it would help.

We have logs of users to product views in a tab delimited format.

foo\t1234
bar\t1234
foo\t4423
baz\t5563

We simply want product views grouped by user and outputed on 1 line.

1234 4423
1234
5563

The above first line would be from the user foo, second bar and third baz.

Thanks
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB