The problem is that "day" is the value in the for loop.
I've tried doing a join with a table that contains the set of days, but the
problem is that you can't do a join on a range ... Hive only support
equality in the join. For example:
INSERT OVERWRITE TABLE mausummary SELECT day, COUNT(DISTINCT(userid))
JOIN logins ON date_add(logins.t, 30) >= days.day AND logins.t <days.day
GROUP BY day;
fails because of the range in the join.
On Wed, Oct 10, 2012 at 8:50 PM, MiaoMiao <[EMAIL PROTECTED]> wrote:
> How about
> SELECT day, COUNT(DISTINCT(userid)) FROM logins WHERE day - logins.day
> < 30 GROUP BY day;
> On Thu, Oct 11, 2012 at 6:05 AM, Tom Hubina <[EMAIL PROTECTED]> wrote:
> > I'm trying to compute the number of active users in the previous 30 days
> > each day over a date range. I can't think of any way to do it directly
> > within Hive so I'm wondering if you guys have any ideas.
> > Basically the algorithm is something like:
> > For each day in date range:
> > SELECT day, COUNT(DISTINCT(userid)) FROM logins WHERE day -
> logins.day <
> > 30;
> > Thanks for your help!
> > Tom