Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Whether this is a bug of count function


+
centerqi hu 2013-09-14, 07:01
Copy link to this message
-
Re: Whether this is a bug of count function
That's actually the documented behavior:
https://pig.apache.org/docs/r0.10.0/func.html#count

There was some discussion about changing this:
https://issues.apache.org/jira/browse/PIG-1014

Patches gratefully accepted..

D
On Sat, Sep 14, 2013 at 12:01 AM, centerqi hu <[EMAIL PROTECTED]> wrote:

> The sample.txt file content:
>
> android,u1,taobao1
> android,u1,taobao1
> ,u2,taobao2
>
> RR = LOAD '/user/www/udc/output/bugfind/sample.txt' USING PigStorage(',')
> as (platform, machineID,  productID);
> RB = GROUP RR BY (productID);
> RES = FOREACH RB{
>                 ITEMUV = DISTINCT RR.machineID;
>                 GENERATE flatten(group),COUNT(ITEMUV) AS UV,COUNT(RR) AS
> PV;
> };
> DUMP RES;
>
> OUTPUT:
>
> (taobao1,1,2)
> (taobao2,1,0)
>
> Why taobao2 the pv is 0, but uv  is 1?
>
> I view? the source code of the COUNT function
>
> If the first column is null, cnt will not increase
>
>   while (it.hasNext()){
>                     Tuple t = (Tuple)it.next();
>                     if (t != null && t.size() > 0 && t.get(0) != null )
>                             cnt++;
>             }
>
> --
> [EMAIL PROTECTED]|齐忠
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB