Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Whether this is a bug of count function


Copy link to this message
-
Whether this is a bug of count function
The sample.txt file content:

android,u1,taobao1
android,u1,taobao1
,u2,taobao2

RR = LOAD '/user/www/udc/output/bugfind/sample.txt' USING PigStorage(',')
as (platform, machineID,  productID);
RB = GROUP RR BY (productID);
RES = FOREACH RB{
                ITEMUV = DISTINCT RR.machineID;
                GENERATE flatten(group),COUNT(ITEMUV) AS UV,COUNT(RR) AS PV;
};
DUMP RES;

OUTPUT:

(taobao1,1,2)
(taobao2,1,0)

Why taobao2 the pv is 0, but uv  is 1?

I view? the source code of the COUNT function

If the first column is null, cnt will not increase

  while (it.hasNext()){
                    Tuple t = (Tuple)it.next();
                    if (t != null && t.size() > 0 && t.get(0) != null )
                            cnt++;
            }

--
[EMAIL PROTECTED]|齐忠
+
Dmitriy Ryaboy 2013-09-24, 06:23
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB