Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Design question - parsing clickstream with query parameters

Copy link to this message
Re: Design question - parsing clickstream with query parameters
This seems reasonable, except it seems like it would make more sense to convert query parameters to maps.  By definition a query parameter is key=value.  And a map is easier to work with in general then a bag, since there's no need to flatten them.


On Jun 11, 2012, at 10:55 AM, Mohit Anchlia wrote:

> I am looking at how to parse URL with query parameters to process
> clickstream data. Are there any examples I can look at? My steps that I
> envision are:
> 1) Read lines and convert query parameters into bags that is a group of
> fields for a particular dimension table. So if Geo is one of the dimensions
> group all the geo related information from that URL as a Bag.
> In the end it would like like {{92122,CA},{Unix,FireFox}}. In this example
> first bag is GEO dimension and the second is Browser dimension.
> 2) Load these into OLAP staging database
> 3) Populate star schema from staging tables
> I am sure other people might already be doing this so I thought I'll check
> as to if this makes sense.