Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> group by clickstream

Copy link to this message
RE: group by clickstream
Nope, tried that, it breaks it back into one tuple per record...not what I want.

-----Original Message-----
From: Віталій Тимчишин [mailto:[EMAIL PROTECTED]]
Sent: Friday, August 31, 2012 1:49 PM
Subject: Re: group by clickstream


Does not FLATTEN do exactly this?

Best regards, Vitalii Tymchyshyn

2012/8/30 Steve Bernstein <[EMAIL PROTECTED]>

> Some clarification on the below.  Ignore the outer bag, I'd removed
> some data elements for clarity and simplicity.  Basically, I'm trying
> to find a way to go from:
> {(pg),(pg),...,(pg)}
> to
> {(pg,pg,...,pg)}
> For an abritrary number of "pg" tuples.
> SB
> -----Original Message-----
> From: Steve Bernstein [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, August 29, 2012 4:28 PM
> Subject: group by clickstream
> Hi all,
> I have a bag, clickstreams: {clickStream: {pageName: chararray}}, for
> which each row represents a sequence of pages and events in a single
> session on a website.  The interior bag, clickstream, represents this
> as a sequence of one or more single element tuples, e.g.,
> {(homepage),(pg1),(pg2),...,(pgN)}
> I'd like to group by the sequences so I can get counts and ultimately
> sort to find the most common clickstreams.  A bag can't be a key for
> grouping, I've discovered, but it seems like it ought to be easy to
> flatten the clickstream bag into some other form such that the
> sequences can be used as keys for grouping.  But I can't figure it out.
> Any ideas?
> Thanks!
> Steve
Best regards,
 Vitalii Tymchyshyn