Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> how to obtain the latest record for each user in a hive table?


Copy link to this message
-
how to obtain the latest record for each user in a hive table?
The table format is something like:

user_id    visiting_time      visiting_web_page
user1          time11             page_string_11
user1          time12             page_string_12
user1          time13             page_string_13          <-- latest time
stamp
user1          time14             page_string_14          <-- also latest
time stamp, same as the one above
user1          time15             page_string_15
  ...                ...                       .....
user2          time21             page_string_21
user2          time22             page_string_22
user2          time23             page_string_23
user2          time24             page_string_24
user2          time25             page_string_25
   ....              .....                       ....

Assume time13 and time14 are the latest time stamp for user1, time22 and
time25 are the latest time stamp for user2,
how to obtain output like:
user1        [page_string_13, page_string_14]
user2        [page_string_22, page_string_25]

ps: run {   select user, max(visiting_time) group by user  }    WILL return
result like:
user1, time13   (same as time 14)
user2, time22   (same as time 25)

many thanks in advance!
+
Edward Capriolo 2012-11-19, 22:55
+
qiaoresearcher 2012-11-19, 23:30