Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Design question - parsing clickstream with query parameters


Copy link to this message
-
Re: Design question - parsing clickstream with query parameters
On Fri, Jun 15, 2012 at 3:34 PM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:

> We just use the Java Map class, with the restriction that the key must be a
> String. There are some helper methods in trunk to work with maps, and you
> can you # to dereference ie map#'key'
>

thanks! If you don't mind could you please share once you flatten them do
you then load it in the star schema in the database?

I think I need to look at map

>
> 2012/6/15 Mohit Anchlia <[EMAIL PROTECTED]>
>
> > On Fri, Jun 15, 2012 at 9:12 AM, Alan Gates <[EMAIL PROTECTED]>
> wrote:
> >
> > > This seems reasonable, except it seems like it would make more sense to
> > > convert query parameters to maps.  By definition a query parameter is
> > > key=value.  And a map is easier to work with in general then a bag,
> since
> > > there's no need to flatten them.
> > >
> > > I've never used them. Is this Map format in hadoop?
> >
> >
> > > Alan.
> > >
> > > On Jun 11, 2012, at 10:55 AM, Mohit Anchlia wrote:
> > >
> > > > I am looking at how to parse URL with query parameters to process
> > > > clickstream data. Are there any examples I can look at? My steps
> that I
> > > > envision are:
> > > >
> > > > 1) Read lines and convert query parameters into bags that is a group
> of
> > > > fields for a particular dimension table. So if Geo is one of the
> > > dimensions
> > > > group all the geo related information from that URL as a Bag.
> > > > In the end it would like like {{92122,CA},{Unix,FireFox}}. In this
> > > example
> > > > first bag is GEO dimension and the second is Browser dimension.
> > > > 2) Load these into OLAP staging database
> > > > 3) Populate star schema from staging tables
> > > >
> > > > I am sure other people might already be doing this so I thought I'll
> > > check
> > > > as to if this makes sense.
> > >
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB