Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> how to make data statistics efficiency in hive?


Copy link to this message
-
Re: how to make data statistics efficiency in hive?
Hi Andy,

I'm not sure if I entirely understood your question, but I think you're
looking for something like this:

select
    concat(date,':',uid),
    sum(1) as total,
    sum(if(a=1,1,0)) AS ca,
    sum(if(b=1,1,0)) AS cb,
    sum(if(c=1,1,0)) AS cc
from mytable
group by uid, date;

Query like this allows you to efficiently compute sums of multiple columns
in a single map-reduce.

Best regards,
Jan
On Wed, Mar 27, 2013 at 4:34 AM, 周梦想 <[EMAIL PROTECTED]> wrote:

> hello,
> about hsql statistics.
>
> table mytable
> date,uid,a,b,c
> --------------------
> 03/13/13       185690475      0       1       1
> 03/13/13       187270278      0       1       0
> 03/13/13       185690475      1       1       0
> 03/13/13       186012530      1       0       1
> 03/13/13       180286243      0       1       0
> 03/13/13       185690475      1       1       0
> 03/13/13       186012530      0       1       0
> 03/13/13       183256782      1       0       0
> 03/14/13       185690475      0       0       1
>
> I want to get one day,each user total count,count a=1 ,count b=1, count c=1
> the out put should like:
>
> key,total, counta, countb, countc
> -----------------------
> 03/13/13:185690475      3      2       3       1
> 03/13/13:187270278      1      0       1       0
> 03/13/13:186012530      2      1       1       1
> 03/13/13:180286243      1      0       1       0
> 03/13/13:183256782      1      1       0       0
> 03/14/13:185690475      1      0       0       1
>
> the hsql i want is:
> select concat(date:uid),count(1),count(a=1),count(b=1),count(c=1) from
> mytable group by uid,date;
>
> but I have to write ugly and inefficiency hsql like :
> select  concat(s1.date:s1.uid),s1.total,s2.ca,s3.cb,s4.cc from
> (
> select date,uid,count(1) total total from mytable group by uid,date) s1
> inner outer join
> (select date,uid,count(1) ca total from mytable where a=1 group by
> uid,date)s2
> inner outer join
> (select date,uid,count(1) cb total from mytable where b=1 group by
> uid,date)s3
> inner outer join
> (select date,uid,count(1) cc total from mytable where c=1 group by
> uid,date)s4
> );
>
> each select sub-clause should run a map-reduce.
>
> if I have to count a very big number of  columns table, this should be a
> very long task.
> some one have any good ideals?
>
> Thank you!
>
> Best Regards,
> Andy Zhou
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB