Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - how to write hive query to solve this problem?


Copy link to this message
-
Re: how to write hive query to solve this problem?
Stephen Sprague 2013-08-31, 00:22
so this is not particular to Hive is it? You could post this on a DB2,
Oracle, or even Stackflow board i'd imagine.
On Fri, Aug 30, 2013 at 4:34 PM, qiaoresearcher <[EMAIL PROTECTED]>wrote:

>
> I have three tables:
>
> Table 1: record when and who visited gas station or not, this contains all
> the users of interest, name all the users as a set A
> date                         |      user name    |   visited gas station?
> 2013-09-01                         tom                             yes
> 2013-09-02                         tom                             yes
> 2013-09-01                         hanks                          yes
> 2013-09-03                         tomy                            yes
> ....                                           ...
>        ...
>
> Table 2: record when and who visited Bestbuy,   the user in set A appear
> here, but not all users of A will appear in this table, also table 2 has
> users does not belong to set A
> date                         |      user name    |   visited Bestbuy?
> 2013-09-01                         tom                             yes
> 2013-09-02                         jacob                           yes
> 2013-09-01                         hanks                          yes
> 2013-09-03                         michael                       yes
> ....                                           ...
>        ...
>
> Table 3: record when and who arrives one of three destinations: CA, NY and
> DC, the users in table 3 has similar situation as users in table 2
> regarding set A.
> date                         |      user name    |   visited Bestbuy or
> not
> 2013-09-01                         tom                             CA
> 2013-09-02                         tom                             NY
> 2013-09-01                         hanks                          DC
> 2013-09-03                         tomy                            CA
> ....                                           ...
>        ...
>
> Now we want to know,  within a 90 days period, what are the following
> numbers:
> (1) for any given day, for the users in table 1, how many of them has a
> path like:   visited Gas station first, after that date, user went to
> Bestbuy, finally arrive CA
> (2) for any given day, for the users in table 1, how many of them has a
> path like:   visited Gas station first, then did not visit Bestbuy, finally
> arrive CA
>
> (3) for any given day, for the users in table 1, how many of them has a
> path like:   visited Gas station first, then went Bestbuy, finally arrive NY
> (4) for any given day, for the users in table 1, how many of them has a
> path like:   visited Gas station first, then did not visit Bestbuy,
> finally arrive NY
>
> (5) for any given day, for the users in table 1, how many of them has a
> path like:   visited Gas station first, then went Bestbuy, finally arrive DC
> (6) for any given day, for the users in table 1, how many of them has a
> path like:   visited Gas station first, then did not visit Bestbuy,
> finally arrive DC
>
> It is possible in a given day, a user may visit Bestbuy many times for
> table 2, this will be simply regarded as a status 'visited Bestbuy'. One
> user will be counted once in a given day.
>  From the day of user visiting gas station, within next 90 days, this user
> has to arrive one of three final destinations in table 3, and arrives only
> one station. no multiple arrived stations.
>
> For example,
> a user tom may visit gas station on 9/1, then went to Bestbuy on 9/5,
> finally went to CA on 9/30, then this user tom can be counted as 1 on 9/1
> for path (1).
> a user hanks may visit gas station on 9/1, then went to Bestbuy on 9/5 and
> on 9/8, finally went to CA on 9/30, then this user tom count as 1 on 9/1
> for path (1).
> a user ruby may visit gas station on 9/1, and tables 2 does not have
> records to show ruby visit Bestbuy till 11/30, and ruby arrives DC in table
> 3 before 11/30, then ruby contribute 1 for path (6) for the day 9/1.