Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - group by clickstream


+
Steve Bernstein 2012-08-29, 23:27
+
Steve Bernstein 2012-08-30, 16:06
+
Steve Bernstein 2012-08-30, 16:22
+
=?KOI8-U?B?96bUwcymyiD0yc... 2012-08-31, 20:48
Copy link to this message
-
RE: group by clickstream
Steve Bernstein 2012-08-31, 23:14
Nope, tried that, it breaks it back into one tuple per record...not what I want.

-----Original Message-----
From: Віталій Тимчишин [mailto:[EMAIL PROTECTED]]
Sent: Friday, August 31, 2012 1:49 PM
To: [EMAIL PROTECTED]
Subject: Re: group by clickstream

Hello.

Does not FLATTEN do exactly this?

Best regards, Vitalii Tymchyshyn

2012/8/30 Steve Bernstein <[EMAIL PROTECTED]>

> Some clarification on the below.  Ignore the outer bag, I'd removed
> some data elements for clarity and simplicity.  Basically, I'm trying
> to find a way to go from:
>
> {(pg),(pg),...,(pg)}
> to
> {(pg,pg,...,pg)}
>
> For an abritrary number of "pg" tuples.
>
> SB
>
> -----Original Message-----
> From: Steve Bernstein [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, August 29, 2012 4:28 PM
> To: [EMAIL PROTECTED]
> Subject: group by clickstream
>
> Hi all,
> I have a bag, clickstreams: {clickStream: {pageName: chararray}}, for
> which each row represents a sequence of pages and events in a single
> session on a website.  The interior bag, clickstream, represents this
> as a sequence of one or more single element tuples, e.g.,
>
> {(homepage),(pg1),(pg2),...,(pgN)}
>
> I'd like to group by the sequences so I can get counts and ultimately
> sort to find the most common clickstreams.  A bag can't be a key for
> grouping, I've discovered, but it seems like it ought to be easy to
> flatten the clickstream bag into some other form such that the
> sequences can be used as keys for grouping.  But I can't figure it out.
>
> Any ideas?
>
> Thanks!
> Steve
>
>
--
Best regards,
 Vitalii Tymchyshyn