Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - How to get percentage of each group?


Copy link to this message
-
Re: How to get percentage of each group?
MiaoMiao 2012-09-07, 07:10
Yeah I figure it out, this query will create 3 jobs, it seems to do
sub-queries without any optimization.

SELECT
      A.userType
    , A.userType_count/B.global_count
FROM
    (
    SELECT
          userType
        , COUNT(1) as userType_count
    FROM
        some_table
    GROUP BY
        userType
    ) A
JOIN
    (
    SELECT
        COUNT(1) as global_count
    FROM
        some_table
    ) B
ON (1=1);

But if all userType are enumerable, I can get their ratio in one job,
which seems like a more optimized way.

SELECT
      SUM(IF(userType='A',1,0))/count(1) as A_ratio
    , SUM(IF(userType='B',1,0))/count(1) as B_ratio
    , SUM(IF(userType='C',1,0))/count(1) as C_ratio
FROM
    some_table;

On Fri, Sep 7, 2012 at 2:59 PM, Bejoy KS <[EMAIL PROTECTED]> wrote:
> Hi
>
> CROSS JOIN is same as giving JOIN keyword. CROSS JOIN just a new notation in
> later releases of hive. JOIN without ON is same as CROSS JOIN
>
> Regards,
> Bejoy KS
>
> ________________________________
> From: MiaoMiao <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Friday, September 7, 2012 11:46 AM
> Subject: Re: How to get percentage of each group?
>
> You mean this one? Hive 0.8 doesn't have CROSS JOIN, so I don't know
> if this query works.
> SELECT
>       A.userType
>     , A.userType_count/B.global_count
> FROM
>     (
>     SELECT
>           userType
>         , COUNT(1) as userType_count
>     FROM
>         some_table
>     GROUP BY
>         userType
>     ) A
> CROSS JOIN
>     (
>     SELECT
>         COUNT(1) as global_count
>     FROM
>         some_table
>     ) B;
> On Fri, Sep 7, 2012 at 1:34 PM, Bertrand Dechoux <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> You could use a cross join.
>> You basically have one table
>>
>> select
>>      userType
>>    , count(1)
>> from
>>    some_table
>> group by
>>    userType
>>
>> and a second one
>>
>> select count(1) from some_table
>>
>> With a cross join you can add the global count to every results in the
>> first
>> table and compute a ratio.
>>
>> Regards
>>
>> Bertrand
>>
>>
>> On Fri, Sep 7, 2012 at 7:23 AM, MiaoMiao <[EMAIL PROTECTED]> wrote:
>>>
>>> I have a table, containing userId and userType.
>>> userId    userType
>>> 1    A
>>> 2    B
>>> 3    C
>>> 4    A
>>> 5    B
>>> 6    B
>>>
>>> I want to get percentage of each userType.
>>> My current solution:
>>> 1. Get count of each group via THRIFT
>>> select
>>>      userType
>>>    , count(1)
>>> from
>>>    some_table
>>> group by
>>>    userType
>>>
>>> 2. Calculate each userType using other programming language like PHP.
>>>
>>> This solution is fine, but I'm just curious, is there a way to do it
>>> in one query?
>>> I know this query works in mysql, but not hive.
>>> select
>>>      userType
>>>    , count(1)/(select count(1) from some_table)
>>> from
>>>    some_table
>>> group by
>>>    userType
>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>
>