Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Rolling MAU computation


+
Tom Hubina 2012-10-10, 22:05
+
Roberto Sanabria 2012-10-10, 22:59
+
Tom Hubina 2012-10-10, 23:04
+
MiaoMiao 2012-10-11, 03:50
Copy link to this message
-
Re: Rolling MAU computation
The problem is that "day" is the value in the for loop.

I've tried doing a join with a table that contains the set of days, but the
problem is that you can't do a join on a range ... Hive only support
equality in the join. For example:

INSERT OVERWRITE TABLE mausummary SELECT day, COUNT(DISTINCT(userid))
    FROM days
    JOIN logins ON date_add(logins.t, 30) >= days.day AND logins.t <days.day
    GROUP BY day;

fails because of the range in the join.

Tom
On Wed, Oct 10, 2012 at 8:50 PM, MiaoMiao <[EMAIL PROTECTED]> wrote:

> How about
> SELECT day, COUNT(DISTINCT(userid)) FROM logins WHERE day - logins.day
> < 30 GROUP BY day;
>
> On Thu, Oct 11, 2012 at 6:05 AM, Tom Hubina <[EMAIL PROTECTED]> wrote:
> > I'm trying to compute the number of active users in the previous 30 days
> for
> > each day over a date range. I can't think of any way to do it directly
> > within Hive so I'm wondering if you guys have any ideas.
> >
> > Basically the algorithm is something like:
> >
> > For each day in date range:
> >    SELECT day, COUNT(DISTINCT(userid)) FROM logins WHERE day -
> logins.day <
> > 30;
> >
> > Thanks for your help!
> >
> > Tom
> >
>
+
Igor Tatarinov 2012-10-11, 06:05
+
Tom Hubina 2012-10-12, 20:02
+
Igor Tatarinov 2012-10-12, 20:08
+
Vijay 2012-10-12, 20:42