Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Filter on tuple question, and how to deal with dity datas?


Copy link to this message
-
Filter on tuple question, and how to deal with dity datas?

Hi,
  
Q1:I have a question about how to use filter on tuple.
The code is:
--------------------------------------------------------
REGISTER pig.jar;
raw = LOAD 'data.txt' USING PigStorage('|') AS (phoneNum, tag, flow, duration, count);
sumed = FOREACH (GROUP raw BY (phoneNum, tag)){
    totalFlow = SUM(raw.flow);
    totalDuration = SUM(raw.duration);
    totalCount = SUM(raw.count);
    GENERATE flatten(group), TOTUPLE(tutalFlow, totalDuration, totalCount) AS condition;
};
filtered = FILTER sumed BY com.filter.TagFilter(condition);
DUMP filtered;
--------------------------------------------------------
But I got an error:
ERROR 1045:
<file reduce.pig, line 9, column 23> Could not infer the matching function for com.filter.TagFilter as multiple or none of them fit. Please use an explicit cast.
Is there anything wrong?

Q2:how to deal with dity datas.
there are some dity datas in my files, such as;
$cat data.txt
1|2|3|4|5|6
2|3|4|5|6|
0|2|3|
7|7||0||
The third line is dity data for me. I want to filter it. But no matter SIZE(),COUNT() or anything else, I can't filter it.
Is there any function or method to solve this question?
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB