Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Whether this is a bug of count function


Copy link to this message
-
Re: Whether this is a bug of count function
There is another UDF called COUNT_STAR that counts nulls. This is a
documented behavior of COUNT in that it ignores null.

http://pig.apache.org/docs/r0.11.1/func.html#count
On Mon, Sep 16, 2013 at 12:22 PM, Daniel Dai <[EMAIL PROTECTED]> wrote:

> It is COUNT.java:105:
> if (t != null && t.size() > 0 && t.get(0) != null)
>
> Seems we don't count tuple with first field null. Not sure why this happen
> but I would think it a bug.
>
> Thanks,
> Daniel
>
>
> On Sun, Sep 15, 2013 at 8:40 PM, centerqi hu <[EMAIL PROTECTED]> wrote:
>
> > The sample.txt file content:
> >
> > android,u1,taobao1
> > android,u1,taobao1
> > ,u2,taobao2
> >
> > RR = LOAD '/user/www/udc/output/bugfind/sample.txt' USING PigStorage(',')
> > as (platform, machineID,  productID);
> > RB = GROUP RR BY (productID);
> > RES = FOREACH RB{
> >                 ITEMUV = DISTINCT RR.machineID;
> >                 GENERATE flatten(group),COUNT(ITEMUV) AS UV,COUNT(RR) AS
> > PV;
> > };
> > DUMP RES;
> >
> > OUTPUT:
> >
> > (taobao1,1,2)
> > (taobao2,1,0)
> >
> > Why taobao2 the pv is 0, but uv  is 1?
> >
> > I view? the source code of the COUNT function
> >
> > If the first column is null, cnt will not increase
> >
> >   while (it.hasNext()){
> >                     Tuple t = (Tuple)it.next();
> >                     if (t != null && t.size() > 0 && t.get(0) != null )
> >                             cnt++;
> >             }
> >
> > --
> > [EMAIL PROTECTED]|齐忠
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>