Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Help in debugging Hive Query


Copy link to this message
-
Help in debugging Hive Query
All,
 
I am trying to determine visits for customer from omniture weblog file using Hive.
 
Table: omniture_web_data
Columns: visid_high,visid_low,evar23,visit_page_num
 
Sample Data:
visid_high,visid_low,evar23,visit_page_num
999,888,1003,10
999,888,1003,14
999,888,1003,6
999,777,1003,12
999,777,1003,20
 
I want to calculate for each Customer Number ( evar23 is  Customer Number ) , total visits. visid_high and visid_low determines a unique visit.
For each distinct visitor, calculate sum of maximum visit_page_num. In above example
 
14 + 20 = 34 should be the total visits for the customer 1003.
 
I am trying to run the following queries - Method 1 is almost the same as Method 2. Except in Method 1 I only choose a particualr customer number 1003. In method 2 , i generalized to all.
 
In Method 1 , I am getting the accurate result. In metnhod 2 , I am not getting the same result as Method 1.
 
Any suggestions on how to trouble shoot. ALso, any alternative approaches.
 
// Method 1
select a.evar23,sum(b.max_visit_page_num) from
(select distinct visid_high,visid_low,evar23 from web.omniture_web_data where evar23='1003') a
JOIN
(select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from omniture_web_data where evar23='1003' group by visid_high,visid_low) b
where a.visid_high=b.visid_high and a.visid_low=b.visid_low
group by a.evar23;
 
///// Result of Method 1
 
1003    34
 
// Method 2

create table temp123 as
select a.evar23,sum(b.max_visit_page_num) from
(select distinct visid_high,visid_low,evar23 from web.omniture_web_data) a
JOIN
(select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from omniture_web_data group by visid_high,visid_low) b
where a.visid_high=b.visid_high and a.visid_low=b.visid_low
group by a.evar23;
 
select * from temp123 where evar23='1003';
 
// The Result of Method 2 is not the same as Method 1. It is showing a different number.
 
 
 
Thanks,
Raj
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB