Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> issue with IsEmpty UDF


+
Ojha, Pankaj 2013-06-14, 12:25
Copy link to this message
-
Re: issue with IsEmpty UDF
Hi Pankaj,

Which version of Pig are you using? It works fine for me. I get the
following output as expected:

((2,22),{(2,22,222,2222)},{(2,22,bb,bbb)})

I tested Pig 0.9, 0.10, 0.11, and trunk. All worked for me.

Thanks,
Cheolsoo
On Fri, Jun 14, 2013 at 5:25 AM, Ojha, Pankaj <[EMAIL PROTECTED]>wrote:

> Hi Team,
>
> We are facing an issue when we use IsEmpty UDF with FILTER
>
> Scenario:
> We have two input files:-
>
> Input File 1: - first
> 1|11|111|1111
> 2|22|222|2222
> 3|33|333|3333
> 4|44|444|4444
> 5|55|555|5555
>
> Input File 2: - second
> 1|a|aa|aaa
> 2|22|bb|bbb
> 3|c|cc|ccc
> 6|d|dd|ddd
>
>
> Our requirement is , on grouping these two input files on the first two
> keys, it should give output only when data is present in both the files for
> a particular key otherwise it should print nothing.
> From the above input files, for key values (2,22), it should only print
> output like below :-
>
> ((2,22),{(2,22,222,2222)},{(2,22,bb,bbb)})
>
> To achieve this, we wrote the code as below:-
>
> first = LOAD 'first' USING PigStorage('|') as
> (a:chararray,b:chararray,c:chararray,d:chararray);
>
> second = LOAD 'second' USING PigStorage('|') as
> (aa:chararray,bb:chararray,cc:chararray,dd:chararray);
>
> cogroup_join = COGROUP first BY (a,b) , second BY (aa,bb);
>
> cogroup_join_filter = FILTER cogroup_join BY NOT IsEmpty(second) AND NOT
> IsEmpty(first);
>
> dump cogroup_join_filter;
>
> But, the output for the cogroup_join_filter is:
> ((1,a),{},{(1,a,aa,aaa)})
> ((2,22),{(2,22,222,2222)},{(2,22,bb,bbb)})
> ((3,c),{},{(3,c,cc,ccc)})
> ((6,d),{},{(6,d,dd,ddd)})
>
> In my opinion, IsEmpty should have filtered out other values where it does
> not find corresponding key values same in both input file except for (2,22).
> But the same is not happening.
> Please have a look and provide your view on this.
>
> Thanks & Regards,
> Pankaj Ojha
>
> This message, including any attachments, is the property of Sears Holdings
> Corporation and/or one of its subsidiaries. It is confidential and may
> contain proprietary or legally privileged information. If you are not the
> intended recipient, please delete it without reading the contents. Thank
> you.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB