Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - need help on writing hive query


Copy link to this message
-
Re: need help on writing hive query
Matt Tucker 2012-10-31, 21:53
I did a similar query a few months ago.  In short, I left-padded the page name with the time stamp, grouped with collect_set, and then used sort_array().  There was some other cleanup work and converting back to string to remove the time stamps, but it remained in order.

If there's an easier way, please let me know.

Matt Tucker

On Oct 31, 2012, at 5:37 PM, Tom Brown <[EMAIL PROTECTED]> wrote:

> It wouldn't retrieve the user's path in a single string, but you could
> simply select the user id and current page, ordered by the timestamp.
>
> It would require a second step to turn it into the single string path,
> so that might be a deal-breaker.
>
> --Tom
>
> On Wed, Oct 31, 2012 at 3:32 PM, Philip Tromans
> <[EMAIL PROTECTED]> wrote:
>> You could use collect_set() and GROUP BY. That wouldn't preserve order
>> though.
>>
>> Phil.
>>
>> On Oct 31, 2012 9:18 PM, "qiaoresearcher" <[EMAIL PROTECTED]> wrote:
>>>
>>> Hi all,
>>>
>>> here is the question. Assume we have a table like:
>>>
>>> ------------------------------------------------------------------------------------------------------------------------------
>>> user_id    ||  user_visiting_time    ||      user_current_web_page     ||
>>> user_previous_web_page
>>> user 1                 time (1,1)                                   page 1
>>> page 0
>>> user 1                 time (1,2)                                   page 2
>>> page 1
>>> user 1                 time (1,3 )                                  page 3
>>> page 2
>>> .....                          ......
>>> ....                                                ....
>>> user n                 time (n,1)                                   page 1
>>> page 0
>>> user n                 time (n,2)                                   page 2
>>> page 1
>>> user n                 time (n,3)                                   page 3
>>> page 2
>>>
>>> that is, in each row, we know the current web page that user is viewing,
>>> and we know the previous web page the user coming from
>>>
>>> now we want to generate a list for each user that recorded the complete
>>> path the user is taking:
>>> i.e., how can we use hive to generate output like:
>>>
>>> ------------------------------------------------------------------------------------------------
>>> user 1 :      page 1   page 2 page 3  page 4  .......... (till reach the
>>> beginning page of user 1)
>>> user 2:       page 1 page 2 page 3  page 4 page 5  .......  ( till reach
>>> the beginning page of user 2)
>>> the web pages viewed by user 1 and user 2 might be different.
>>>
>>> can we generate this using hive?
>>>
>>> thanks,