Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Distinct question


Copy link to this message
-
Re: Distinct question
I have a simple EvalFunc as so:

public class Set extends EvalFunc<Tuple> {
   public Tuple exec(Tuple tuple) throws IOException {
     Set<Object> unique = new HashSet<Object>();
     unique.addAll(tuple.getAll());
     return TupleFactory.getInstance().newTuple(unique);
   }
}

How can I apply this to a result set though?  When I try:

rows = LOAD 'foo';
rows = FOREACH rows GENERATE com.mycompany.piggybank.Set(rows);
2011-04-03 09:16:25,423 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1000: Error during parsing. Scalars can be only used with projections

I get the above error? Should I be using something other than a EvalFunc?

Thanks
On 4/3/11 8:53 AM, Bill Graham wrote:
> You could add all the values to a set in a udf and the return it's contents.
>
> On Sunday, April 3, 2011, Mark<[EMAIL PROTECTED]>  wrote:
>> If I have a tuple of values, is there a way to eliminate duplicate values per tuple?
>>
>> Example:
>> (5,5,4,7,2,3,4,9) = (5,4,7,2,3,9)
>>
>> Thanks
>>
>>
>>
>>