Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Design question - parsing clickstream with query parameters


Copy link to this message
-
Re: Design question - parsing clickstream with query parameters
Mohit Anchlia 2012-06-15, 19:59
On Fri, Jun 15, 2012 at 9:12 AM, Alan Gates <[EMAIL PROTECTED]> wrote:

> This seems reasonable, except it seems like it would make more sense to
> convert query parameters to maps.  By definition a query parameter is
> key=value.  And a map is easier to work with in general then a bag, since
> there's no need to flatten them.
>
> I've never used them. Is this Map format in hadoop?
> Alan.
>
> On Jun 11, 2012, at 10:55 AM, Mohit Anchlia wrote:
>
> > I am looking at how to parse URL with query parameters to process
> > clickstream data. Are there any examples I can look at? My steps that I
> > envision are:
> >
> > 1) Read lines and convert query parameters into bags that is a group of
> > fields for a particular dimension table. So if Geo is one of the
> dimensions
> > group all the geo related information from that URL as a Bag.
> > In the end it would like like {{92122,CA},{Unix,FireFox}}. In this
> example
> > first bag is GEO dimension and the second is Browser dimension.
> > 2) Load these into OLAP staging database
> > 3) Populate star schema from staging tables
> >
> > I am sure other people might already be doing this so I thought I'll
> check
> > as to if this makes sense.
>
>