Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Simple word count in pig..


Copy link to this message
-
Simple word count in pig..
Hi,

I have data already processed in following form:
( id ,{ bag of words})
So for example:

(foobar, {(foo), (foo),(foobar),(bar)})
(foo,{(bar),(bar)})

and so on..
describe processed gives me:
processed: {id: chararray,tokens: {tuple_of_tokens: (token: chararray)}}
Now what I want is.. also count the number of times a word appears in this
data and output it as
foobar, foo, 2
foobar,foobar,1
foobar,bar,1
foo,bar,2

and so on...

How do I do this in pig?