Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> reduce continuous sessions


+
Marco Cadetg 2012-08-30, 08:00
Copy link to this message
-
Re: reduce continuous sessions
Seems like you are looking to group by "id" and get the MIN and MAX
timestamp for each group?
On Thu, Aug 30, 2012 at 1:00 AM, Marco Cadetg <[EMAIL PROTECTED]> wrote:

> Hi there,
>
> I do have some user session which look something on the following lines:
>
> id:chararray, start:long(unix timestamp), end:long(unix timestamp)
> xxx,1,3
> xxx,4,7
> yyy,1,2
> yyy,5,7
> zzz,6,7
> zzz,7,10
>
> I would like to to combine the rows which belong to a continues session
> e.g. in my example the result should be the following:
> xxx,1,7
> yyy,1,2
> yyy,5,7
> zzz,6,10
>
> I guess there is no way to do this directly in pig but rather by using a
> UDF. Can someone give me a pointer on how you would achieve this?
>
> Thanks,
> -Marco
>
+
Marco Cadetg 2012-08-30, 11:41
+
Steve Bernstein 2012-08-30, 16:02
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB