Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Last value for a column


Copy link to this message
-
Re: Last value for a column
Igor Tatarinov 2012-01-26, 20:08
I don't think there is a better way to implement your query using the
standard SQL/Hive.

A python reducer (or a java UDF) is the way to go.

I don't think clustering would help since there is no way to specify what
you want in HiveQL alone.

igor
decide.com

On Thu, Jan 26, 2012 at 3:23 AM, <[EMAIL PROTECTED]> wrote:

> Dear all,****
>
> I am struggling with a Hive query where I am trying to get the last value
> for a column.****
>
> ** **
>
> Let say I have a table T with three columns: user_id, time, colour and I
> want to know for each user_id what is its last colour value.****
>
> ** **
>
> At the moment I am using the following (naïve) query:****
>
> ** **
>
> SELECT T1.user_id, T1.lastTime, T2.colour FROM (****
>
>               SELECT user_id, max(Time) AS lastTime FROM T GROUP BY
> user_id ) T1****
>
>              JOIN T T2 ON (T1.user_id=T2.user_id AND T2.time=T1.lastTime)*
> ***
>
>                                        ****
>
> I am not happy with this query of course since I have to read the table T
> twice. ****
>
> ** **
>
> So I have three questions:****
>
> **1)      **Is there a way to re-write this query in a more efficient way
> ?****
>
> ** **
>
> **2)      **Wouldn’t that be better to  stream the output of a
> distributed by on user_id into a python script and print out only one row
> per user_id  (assuming  sorted by user_id and time) ?****
>
> ** **
>
> ** **
>
> **3)      **Couldn’t I create the table T with cluster by on user_id and
> sorted by time at the first place and then use this structure to extract
> the rows I am interested in ?****
>
> ** **
>
> ** **
>
> Many thanks for your help,****
>
> Michael****
>
> _________________________________________________________________________________________________________________________
>
> Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
> France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci
>
> This message and its attachments may contain confidential or privileged information that may be protected by law;
> they should not be distributed, used or copied without authorization.
> If you have received this email in error, please notify the sender and delete this message and its attachments.
> As emails may be altered, France Telecom - Orange shall not be liable if this message was modified, changed or falsified.
> Thank you.
>
>