You can solve this using the DISTINCT operator to solve this, it will give you only the unique entries and than you can count them.
data = LOAD '...' USING PigStorage() as (id:int, field1:chararray, field2:chararray); unique_data = DISTINCT data; unique_count = FOREACH (GROUP unique_data all) GENERATE COUNT($1); dump unique_count; On Tue, Apr 2, 2013 at 2:05 PM, jamal sasha <[EMAIL PROTECTED]> wrote:
> Hi, > I have data in hdfs like: > > id1,field1,field2 > 1,2,3 > 1,2,3 > 1,2,4 > 1,2,5 > I want to find the number of unique entries using pig.. > So here, number of unique entries are 3 ( as 1,2,3 is repeated twice) > > How do i find this? > > Thanks >
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext