Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - reduce continuous sessions


Copy link to this message
-
Re: reduce continuous sessions
Prashant Kommireddi 2012-08-30, 08:07
Seems like you are looking to group by "id" and get the MIN and MAX
timestamp for each group?
On Thu, Aug 30, 2012 at 1:00 AM, Marco Cadetg <[EMAIL PROTECTED]> wrote:

> Hi there,
>
> I do have some user session which look something on the following lines:
>
> id:chararray, start:long(unix timestamp), end:long(unix timestamp)
> xxx,1,3
> xxx,4,7
> yyy,1,2
> yyy,5,7
> zzz,6,7
> zzz,7,10
>
> I would like to to combine the rows which belong to a continues session
> e.g. in my example the result should be the following:
> xxx,1,7
> yyy,1,2
> yyy,5,7
> zzz,6,10
>
> I guess there is no way to do this directly in pig but rather by using a
> UDF. Can someone give me a pointer on how you would achieve this?
>
> Thanks,
> -Marco
>