Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Distinct IDs from different time periods


Copy link to this message
-
Distinct IDs from different time periods
Mike Sukmanowsky 2013-08-13, 20:32
Hi all,

Trying to produce some data using clickstream logs from Pig that does the
following:

   1. Pull data for the past 30 days (current period)
   2. Classify Group A as users who had activity in the current period but
   not 30 days prior to the current period.
   3. Classify Group B effectively as all {users in current period} -
   {Group A}

To make the example concrete, let's say end date is July 30, 2013.

So Group A users =  anyone who had activity from Jul 1 - Jul 30, 2013 but
did not have activity in Jun 1 - Jun 30.
Group B users = anyone who had activity activity from Jul 1 - Jul 30, 2013
and also had activity in Jun 1 - Jun 30.

I've had some initial thoughts for how to approach this but none of them
seem great.  Any thoughts from the group?

Mike

--
Mike Sukmanowsky

Product Lead, http://parse.ly
989 Avenue of the Americas, 3rd Floor
New York, NY  10018
p: +1 (416) 953-4248
e: [EMAIL PROTECTED]
+
Serega Sheypak 2013-08-13, 23:44