Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Design question - parsing clickstream with query parameters


Copy link to this message
-
Re: Design question - parsing clickstream with query parameters
This seems reasonable, except it seems like it would make more sense to convert query parameters to maps.  By definition a query parameter is key=value.  And a map is easier to work with in general then a bag, since there's no need to flatten them.

Alan.

On Jun 11, 2012, at 10:55 AM, Mohit Anchlia wrote:

> I am looking at how to parse URL with query parameters to process
> clickstream data. Are there any examples I can look at? My steps that I
> envision are:
>
> 1) Read lines and convert query parameters into bags that is a group of
> fields for a particular dimension table. So if Geo is one of the dimensions
> group all the geo related information from that URL as a Bag.
> In the end it would like like {{92122,CA},{Unix,FireFox}}. In this example
> first bag is GEO dimension and the second is Browser dimension.
> 2) Load these into OLAP staging database
> 3) Populate star schema from staging tables
>
> I am sure other people might already be doing this so I thought I'll check
> as to if this makes sense.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB