Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - First/last in npath

Justin Workman 2013-08-21, 05:43
Harish Butani 2013-08-21, 18:42
Justin Workman 2013-08-21, 19:25
Copy link to this message
Re: First/last in npath
Harish Butani 2013-08-22, 00:48
Can you try this:

select search_terms, productid, clicks_to_product from npath ( on clicks
                distributed by sessionid sort by timestamp
                arg2('SEARCH'), arg3(page = 'SEARCH'),
                arg4('PRODUCT'), arg5(page = 'PRODUCT'),
                arg5('NOTPRODUCT'), arg5(page != 'PRODUCT'),
                arg6('search_terms,  (size(tpath)-1) as clicks_to_product, tpath[size(tpath) -1].productid as productid')
- added NOTPRODUCT to capture clicks between SEARCH and PRODUCT
- you don't need first_value for search_terms, because you are getting the row back starting at which the Pattern matches.
- to get the last_value, i am hoping this works: tpath[size(tpath) -1].productid
On Aug 21, 2013, at 12:25 PM, Justin Workman <[EMAIL PROTECTED]> wrote:

> Assuming click stream type of data I want to get the search terms from the first search request, and return the product id that was eventually viewed and the number of clicks to the product. So something like this
> select search_terms, productid, clicks_to_product from npath ( on clicks
>                 distributed by sessionid sort by timestamp
>                 arg1('SEARCH.PRODUCT'),
>                 arg2('SEARCH'), arg3(page = 'SEARCH'),
>                 arg4('PRODUCT'), arg5([age = 'PRODUCT'),
>                 arg6('first_value(search_terms) as search_terms, last_value(productid) as productid, (size(tpath)-1) as clicks_to_product')
>                 );
> From what I have seen, I will get the search terms from the first search without the first_value, however it would be nice to be able to use first_value to guarantee that. I cannot get the productid from the last tpath object using this. I did try and get the last_value(tpath.productid) in the outer query, however that returned the productid ( and all nulls leading up to the product viewed page) in the very tpath value for the very last row returned from the inner npath select, eg not the last value for the productid for that row. I can use tpath.productid in place of productid in the outer query and it returns the nulls for each row in the current tpath, upto the final product view.
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
Justin Workman 2013-08-22, 01:27
Justin Workman 2013-08-22, 00:58
Edward Capriolo 2013-08-21, 14:46