Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> how to obtain the latest record for each user in a hive table?


Copy link to this message
-
how to obtain the latest record for each user in a hive table?
The table format is something like:

user_id    visiting_time      visiting_web_page
user1          time11             page_string_11
user1          time12             page_string_12
user1          time13             page_string_13          <-- latest time
stamp
user1          time14             page_string_14          <-- also latest
time stamp, same as the one above
user1          time15             page_string_15
  ...                ...                       .....
user2          time21             page_string_21
user2          time22             page_string_22
user2          time23             page_string_23
user2          time24             page_string_24
user2          time25             page_string_25
   ....              .....                       ....

Assume time13 and time14 are the latest time stamp for user1, time22 and
time25 are the latest time stamp for user2,
how to obtain output like:
user1        [page_string_13, page_string_14]
user2        [page_string_22, page_string_25]

ps: run {   select user, max(visiting_time) group by user  }    WILL return
result like:
user1, time13   (same as time 14)
user2, time22   (same as time 25)

many thanks in advance!
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB