Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> How to get percentage of each group?


Copy link to this message
-
Re: How to get percentage of each group?
Yeah I figure it out, this query will create 3 jobs, it seems to do
sub-queries without any optimization.

SELECT
      A.userType
    , A.userType_count/B.global_count
FROM
    (
    SELECT
          userType
        , COUNT(1) as userType_count
    FROM
        some_table
    GROUP BY
        userType
    ) A
JOIN
    (
    SELECT
        COUNT(1) as global_count
    FROM
        some_table
    ) B
ON (1=1);

But if all userType are enumerable, I can get their ratio in one job,
which seems like a more optimized way.

SELECT
      SUM(IF(userType='A',1,0))/count(1) as A_ratio
    , SUM(IF(userType='B',1,0))/count(1) as B_ratio
    , SUM(IF(userType='C',1,0))/count(1) as C_ratio
FROM
    some_table;

On Fri, Sep 7, 2012 at 2:59 PM, Bejoy KS <[EMAIL PROTECTED]> wrote:
> Hi
>
> CROSS JOIN is same as giving JOIN keyword. CROSS JOIN just a new notation in
> later releases of hive. JOIN without ON is same as CROSS JOIN
>
> Regards,
> Bejoy KS
>
> ________________________________
> From: MiaoMiao <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Friday, September 7, 2012 11:46 AM
> Subject: Re: How to get percentage of each group?
>
> You mean this one? Hive 0.8 doesn't have CROSS JOIN, so I don't know
> if this query works.
> SELECT
>       A.userType
>     , A.userType_count/B.global_count
> FROM
>     (
>     SELECT
>           userType
>         , COUNT(1) as userType_count
>     FROM
>         some_table
>     GROUP BY
>         userType
>     ) A
> CROSS JOIN
>     (
>     SELECT
>         COUNT(1) as global_count
>     FROM
>         some_table
>     ) B;
> On Fri, Sep 7, 2012 at 1:34 PM, Bertrand Dechoux <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> You could use a cross join.
>> You basically have one table
>>
>> select
>>      userType
>>    , count(1)
>> from
>>    some_table
>> group by
>>    userType
>>
>> and a second one
>>
>> select count(1) from some_table
>>
>> With a cross join you can add the global count to every results in the
>> first
>> table and compute a ratio.
>>
>> Regards
>>
>> Bertrand
>>
>>
>> On Fri, Sep 7, 2012 at 7:23 AM, MiaoMiao <[EMAIL PROTECTED]> wrote:
>>>
>>> I have a table, containing userId and userType.
>>> userId    userType
>>> 1    A
>>> 2    B
>>> 3    C
>>> 4    A
>>> 5    B
>>> 6    B
>>>
>>> I want to get percentage of each userType.
>>> My current solution:
>>> 1. Get count of each group via THRIFT
>>> select
>>>      userType
>>>    , count(1)
>>> from
>>>    some_table
>>> group by
>>>    userType
>>>
>>> 2. Calculate each userType using other programming language like PHP.
>>>
>>> This solution is fine, but I'm just curious, is there a way to do it
>>> in one query?
>>> I know this query works in mysql, but not hive.
>>> select
>>>      userType
>>>    , count(1)/(select count(1) from some_table)
>>> from
>>>    some_table
>>> group by
>>>    userType
>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB