Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Distinct IDs from different time periods


Copy link to this message
-
Distinct IDs from different time periods
Hi all,

Trying to produce some data using clickstream logs from Pig that does the
following:

   1. Pull data for the past 30 days (current period)
   2. Classify Group A as users who had activity in the current period but
   not 30 days prior to the current period.
   3. Classify Group B effectively as all {users in current period} -
   {Group A}

To make the example concrete, let's say end date is July 30, 2013.

So Group A users =  anyone who had activity from Jul 1 - Jul 30, 2013 but
did not have activity in Jun 1 - Jun 30.
Group B users = anyone who had activity activity from Jul 1 - Jul 30, 2013
and also had activity in Jun 1 - Jun 30.

I've had some initial thoughts for how to approach this but none of them
seem great.  Any thoughts from the group?

Mike

--
Mike Sukmanowsky

Product Lead, http://parse.ly
989 Avenue of the Americas, 3rd Floor
New York, NY  10018
p: +1 (416) 953-4248
e: [EMAIL PROTECTED]
+
Serega Sheypak 2013-08-13, 23:44
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB