Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Filter on tuple question, and how to deal with dity datas?

Copy link to this message
Filter on tuple question, and how to deal with dity datas?
何琦 2013-04-19, 02:57

Q1:I have a question about how to use filter on tuple.
The code is:
REGISTER pig.jar;
raw = LOAD 'data.txt' USING PigStorage('|') AS (phoneNum, tag, flow, duration, count);
sumed = FOREACH (GROUP raw BY (phoneNum, tag)){
    totalFlow = SUM(raw.flow);
    totalDuration = SUM(raw.duration);
    totalCount = SUM(raw.count);
    GENERATE flatten(group), TOTUPLE(tutalFlow, totalDuration, totalCount) AS condition;
filtered = FILTER sumed BY com.filter.TagFilter(condition);
DUMP filtered;
But I got an error:
ERROR 1045:
<file reduce.pig, line 9, column 23> Could not infer the matching function for com.filter.TagFilter as multiple or none of them fit. Please use an explicit cast.
Is there anything wrong?

Q2:how to deal with dity datas.
there are some dity datas in my files, such as;
$cat data.txt
The third line is dity data for me. I want to filter it. But no matter SIZE(),COUNT() or anything else, I can't filter it.
Is there any function or method to solve this question?