Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Count empty relation after filtering


Copy link to this message
-
Re: Count empty relation after filtering
So basically this means that we were trying to look at this from RDBMS' SQL
perspective where 'SELECT COUNT(*) FROM TABLE' returns 0 even if there is
nothing in the result set and that is why we ignored the possibility that
FOREACH might not being executed at all (which could be by design)?

-Shahab
On Wed, May 29, 2013 at 10:13 AM, Marco Brinkmann
<[EMAIL PROTECTED]>wrote:

> Thanks, but this does not change anything. My personal guess (and I only
> work for a few days with pig) is that FOREACH will never be executed,
> because the relation 'test' is empty.
>
>
> 2013/5/29 Shahab Yunus <[EMAIL PROTECTED]>
>
> > Try COUNT_STAR.
> >
> > -Shahab
> >
> >
> > On Wed, May 29, 2013 at 9:55 AM, Marco Brinkmann <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi everybody,
> > >
> > > I have a rather simple question and scenario, but still I could not
> find
> > an
> > > answer in the documention or in other resource:
> > >
> > > id, valid
> > > (1, false)
> > > (2, false)
> > >
> > > records = LOAD 'test.csv' USING PigStorage(',') AS (id:long,
> > > valid:boolean);
> > >
> > > test = FILTER records BY valid == true;
> > > test_count = FOREACH (GROUP test ALL) GENERATE COUNT(test);
> > >
> > > DUMP test_count;
> > >
> > >
> > > I would expect that 'valid_count' nows contains '0'. But the dump is
> > > completely empty (with 'valid == false' I get '(2)' as expected). I use
> > pig
> > > 0.11.1.
> > >
> > > Could someone point me in the right direction?
> > >
> > > Cheers, Marco
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB