Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Whether this is a bug of count function


Copy link to this message
-
Whether this is a bug of count function
centerqi hu 2013-09-16, 03:40
The sample.txt file content:

android,u1,taobao1
android,u1,taobao1
,u2,taobao2

RR = LOAD '/user/www/udc/output/bugfind/sample.txt' USING PigStorage(',')
as (platform, machineID,  productID);
RB = GROUP RR BY (productID);
RES = FOREACH RB{
                ITEMUV = DISTINCT RR.machineID;
                GENERATE flatten(group),COUNT(ITEMUV) AS UV,COUNT(RR) AS PV;
};
DUMP RES;

OUTPUT:

(taobao1,1,2)
(taobao2,1,0)

Why taobao2 the pv is 0, but uv  is 1?

I view? the source code of the COUNT function

If the first column is null, cnt will not increase

  while (it.hasNext()){
                    Tuple t = (Tuple)it.next();
                    if (t != null && t.size() > 0 && t.get(0) != null )
                            cnt++;
            }

--
[EMAIL PROTECTED]|齐忠