Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Count empty relation after filtering


Copy link to this message
-
Re: Count empty relation after filtering
Try the bincond operator.  Something like this might work:

...
test_count = FOREACH (GROUP test ALL) GENERATE COUNT(test) AS total;
test_count_2 = FOREACH test_count GENERATE id, valid, (total IS NULL ? 0 : total);
DUMP test_count_2;
-Peter
________________________________
 From: Marco Brinkmann <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Wednesday, May 29, 2013 11:43 AM
Subject: Re: Count empty relation after filtering
 

I tried to explain why in my basic understanding an operation in a foreach
(count, count_star or anything else) will not leed to any success. And I
still appreciate any hints or tricks to achieve the above.
2013/5/29 Shahab Yunus <[EMAIL PROTECTED]>

> So basically this means that we were trying to look at this from RDBMS' SQL
> perspective where 'SELECT COUNT(*) FROM TABLE' returns 0 even if there is
> nothing in the result set and that is why we ignored the possibility that
> FOREACH might not being executed at all (which could be by design)?
>
> -Shahab
>
>
> On Wed, May 29, 2013 at 10:13 AM, Marco Brinkmann
> <[EMAIL PROTECTED]>wrote:
>
> > Thanks, but this does not change anything. My personal guess (and I only
> > work for a few days with pig) is that FOREACH will never be executed,
> > because the relation 'test' is empty.
> >
> >
> > 2013/5/29 Shahab Yunus <[EMAIL PROTECTED]>
> >
> > > Try COUNT_STAR.
> > >
> > > -Shahab
> > >
> > >
> > > On Wed, May 29, 2013 at 9:55 AM, Marco Brinkmann <
> > [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Hi everybody,
> > > >
> > > > I have a rather simple question and scenario, but still I could not
> > find
> > > an
> > > > answer in the documention or in other resource:
> > > >
> > > > id, valid
> > > > (1, false)
> > > > (2, false)
> > > >
> > > > records = LOAD 'test.csv' USING PigStorage(',') AS (id:long,
> > > > valid:boolean);
> > > >
> > > > test = FILTER records BY valid == true;
> > > > test_count = FOREACH (GROUP test ALL) GENERATE COUNT(test);
> > > >
> > > > DUMP test_count;
> > > >
> > > >
> > > > I would expect that 'valid_count' nows contains '0'. But the dump is
> > > > completely empty (with 'valid == false' I get '(2)' as expected). I
> use
> > > pig
> > > > 0.11.1.
> > > >
> > > > Could someone point me in the right direction?
> > > >
> > > > Cheers, Marco
> > > >
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB