Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Help with last 30 day unique user query


Copy link to this message
-
Re: Help with last 30 day unique user query
Thanks Alex! That is exactly what I thought was the limitation but wanted to
make sure I'm not missing anything.

On Fri, Oct 15, 2010 at 10:51 AM, Alex Boisvert <[EMAIL PROTECTED]> wrote:

> As far as I know, Hive has no built-in support for sliding-window
> analytics.  There is an enhancement request here:
> https://issues.apache.org/jira/browse/HIVE-896
>
> <https://issues.apache.org/jira/browse/HIVE-896>Without such support, the
> brute force way of doing things is,
>
> SELECT COUNT(DISTINCT user_id) FROM events WHERE event_date > start_date
> and event_date <= end_date;
>
> (repeated N times to cover each day of your time window).
>
> alex
>
> On Thu, Oct 14, 2010 at 11:36 PM, Vijay <[EMAIL PROTECTED]> wrote:
>
>> Hi, I need help with this scenario. We have a table of events which has
>> columns date, event (not important for this discussion), and user_id. It is
>> obviously easy to find number of unique users for each day. I also need to
>> find number of unique users in the last 30 days for each day. This is also
>> quite simple to do for one day. However, I cannot figure out how to do this
>> for a range of days. Something like this is pretty straightforward in most
>> RDBMS but with HiveQL has I'm finding this hard. I might be missing
>> something simple though. Any help is appreciated. Ideally the query should
>> also be as optimized as possible as this table could be huge.
>>
>> Thanks,
>> Vijay
>>
>>
>