Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> group by clickstream


Copy link to this message
-
group by clickstream
Hi all,
I have a bag, clickstreams: {clickStream: {pageName: chararray}}, for which each row represents a sequence of pages and events in a single session on a website.  The interior bag, clickstream, represents this as a sequence of one or more single element tuples, e.g.,

{(homepage),(pg1),(pg2),...,(pgN)}

I'd like to group by the sequences so I can get counts and ultimately sort to find the most common clickstreams.  A bag can't be a key for grouping, I've discovered, but it seems like it ought to be easy to flatten the clickstream bag into some other form such that the sequences can be used as keys for grouping.  But I can't figure it out.

Any ideas?

Thanks!
Steve

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB