Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Design question - parsing clickstream with query parameters

Copy link to this message
Design question - parsing clickstream with query parameters
Mohit Anchlia 2012-06-11, 17:55
I am looking at how to parse URL with query parameters to process
clickstream data. Are there any examples I can look at? My steps that I
envision are:

1) Read lines and convert query parameters into bags that is a group of
fields for a particular dimension table. So if Geo is one of the dimensions
group all the geo related information from that URL as a Bag.
In the end it would like like {{92122,CA},{Unix,FireFox}}. In this example
first bag is GEO dimension and the second is Browser dimension.
2) Load these into OLAP staging database
3) Populate star schema from staging tables

I am sure other people might already be doing this so I thought I'll check
as to if this makes sense.