Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Rolling MAU computation


Copy link to this message
-
Re: Rolling MAU computation
You just need to put the join condition in the WHERE clause. That way Hive
will do a cartesian product followed by a filter.

On Fri, Oct 12, 2012 at 1:02 PM, Tom Hubina <[EMAIL PROTECTED]> wrote:

> I think I see what you're saying about the temp table with start/end dates
> (30x expansion makes sense) and it sounds like it should work. I just need
> to figure out a good way to generate the table. Thanks!
>
> Tom
>
> On Wed, Oct 10, 2012 at 11:05 PM, Igor Tatarinov <[EMAIL PROTECTED]> wrote:
>
>> If you have a lot of data, you might have to write a custom reducer (in
>> python) to keep track of the moving date window.
>>
>> If you don't have that much data, you might want to use a temp table
>> <start_date, end_date> such that datediff(end_date, start_date) < 30. To
>> create such a table, you can self-join a table of unique dates using the
>> above condition. Then, you would join your data with that table on
>> start_date and group by end_date counting distinct user_ids. Hope I got
>> that right :)
>>
>> The latter approach will essentially multiply the number of rows by 30.
>>
>> igor
>> decide.com
>>
>>
>> On Wed, Oct 10, 2012 at 3:05 PM, Tom Hubina <[EMAIL PROTECTED]> wrote:
>>
>>> I'm trying to compute the number of active users in the previous 30 days
>>> for each day over a date range. I can't think of any way to do it directly
>>> within Hive so I'm wondering if you guys have any ideas.
>>>
>>> Basically the algorithm is something like:
>>>
>>> For each day in date range:
>>>    SELECT day, COUNT(DISTINCT(userid)) FROM logins WHERE day -
>>> logins.day < 30;
>>>
>>>  Thanks for your help!
>>>
>>> Tom
>>>
>>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB