Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Distinct question


Copy link to this message
-
Re: Distinct question
I have a simple EvalFunc as so:

public class Set extends EvalFunc<Tuple> {
   public Tuple exec(Tuple tuple) throws IOException {
     Set<Object> unique = new HashSet<Object>();
     unique.addAll(tuple.getAll());
     return TupleFactory.getInstance().newTuple(unique);
   }
}

How can I apply this to a result set though?  When I try:

rows = LOAD 'foo';
rows = FOREACH rows GENERATE com.mycompany.piggybank.Set(rows);
2011-04-03 09:16:25,423 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1000: Error during parsing. Scalars can be only used with projections

I get the above error? Should I be using something other than a EvalFunc?

Thanks
On 4/3/11 8:53 AM, Bill Graham wrote:
> You could add all the values to a set in a udf and the return it's contents.
>
> On Sunday, April 3, 2011, Mark<[EMAIL PROTECTED]>  wrote:
>> If I have a tuple of values, is there a way to eliminate duplicate values per tuple?
>>
>> Example:
>> (5,5,4,7,2,3,4,9) = (5,4,7,2,3,9)
>>
>> Thanks
>>
>>
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB