Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - help with Map Type


Copy link to this message
-
Re: help with Map Type
Mohit Anchlia 2012-06-19, 19:27
On Tue, Jun 19, 2012 at 10:46 AM, Subir S <[EMAIL PROTECTED]> wrote:

> I think content in the end of this link
>
> http://elasticmapreduce.s3.amazonaws.com/samples/pig-apache/do-reports.pigwill
> help you!!
>
> thanks! I get 404 when I click on that link.
>  On Tue, Jun 19, 2012 at 10:50 PM, Subir S <[EMAIL PROTECTED]>
> wrote:
>
> > I suggest you load with 2 fields. (uri, query) split at '?' delimiter.
> >
> > Then use regex_extract to extract abc.com and regex_extract_all to
> > extract query parameters.
> >
> > Use foreach...generate to make query into a map.
> >
> >
> > On Tue, Jun 19, 2012 at 3:33 AM, Mohit Anchlia <[EMAIL PROTECTED]
> >wrote:
> >
> >> sorry that wasn't a link. It's my input to the pig. Basically what's
> >> inside
> >> params.dat. When I run those 3 pig lines I get empty output. What I want
> >> is
> >> something like this:
> >>
> >> http://abc.com/?a=v1&b=v2
> >>
> >> broken down into a map and also be able to preserve abc.com. Otherwise
> if
> >> it's complex I can write UDFs
> >>
> >>
> >> On Mon, Jun 18, 2012 at 1:04 PM, Subir S <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >> > I think link Mohit mentioned was his input. Not sure if i understood
> >> > correctly.
> >> >
> >> > I suspect something related to the schema.
> >> >
> >> > http://pig.apache.org/docs/r0.9.1/basic.html#map-schema
> >> >
> >> > http://stackoverflow.com/a/8238591
> >> >
> >> > So when you load with delimiter '&', what will happen to the first
> >> field?
> >> > and how will the second field automatically become a map...I mean in
> >> your
> >> > schema... you mention only one field...not two fields..URL&QUERY
> >> >
> >> > Thanks, Subir
> >> >
> >> > On Tue, Jun 19, 2012 at 12:20 AM, Jonathan Coveney <
> [EMAIL PROTECTED]
> >> > >wrote:
> >> >
> >> > > Your link does not work, I recommend using pastebin.
> >> > >
> >> > > 2012/6/18 Mohit Anchlia <[EMAIL PROTECTED]>
> >> > >
> >> > > > I am trying to parse URL using map type of pig. My query string
> is:
> >> > > >
> >> > > >
> >> https://mail.google.com/mail/?tab=wm#drafts/13800c4ea3d11511&mail=123
> >> > > >
> >> > > > My very simple script for testing is this. But when I look at the
> >> part
> >> > > file
> >> > > > it returns null.
> >> > > >
> >> > > > A = LOAD '/examples/map/input/params.dat' USING PigStorage('&') AS
> >> > > > (M:map[]);
> >> > > >
> >> > > > rmf '/examples/map/output/';
> >> > > >
> >> > > > STORE B INTO '/examples/map/output/';
> >> > > >
> >> > > > I am working on analyzing clickstream data. For this I need to
> first
> >> > > parse
> >> > > > these strings into files representing dimensions and also do
> >> > > sessionization
> >> > > > on them before loading it into RDBMS.
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>