Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Simple word count in pig..


Copy link to this message
-
Simple word count in pig..
Hi,

I have data already processed in following form:
( id ,{ bag of words})
So for example:

(foobar, {(foo), (foo),(foobar),(bar)})
(foo,{(bar),(bar)})

and so on..
describe processed gives me:
processed: {id: chararray,tokens: {tuple_of_tokens: (token: chararray)}}
Now what I want is.. also count the number of times a word appears in this
data and output it as
foobar, foo, 2
foobar,foobar,1
foobar,bar,1
foo,bar,2

and so on...

How do I do this in pig?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB