Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Filter on tuple question, and how to deal with dity datas?


Copy link to this message
-
Re: Filter on tuple question, and how to deal with dity datas?
Ruslan Al-Fakikh 2013-04-19, 21:01
Hi:

Q1: maybe there is something wrong with the udf itself?
Q2: How do you specify the data as dirty? One of your 6 fields is null?
then you could something like: FILTER BY ($0 IS NULL OR $1 IS NULL...)

Ruslan
On Fri, Apr 19, 2013 at 6:57 AM, 何琦 <[EMAIL PROTECTED]> wrote:

>
> Hi,
>
> Q1:I have a question about how to use filter on tuple.
> The code is:
> --------------------------------------------------------
> REGISTER pig.jar;
> raw = LOAD 'data.txt' USING PigStorage('|') AS (phoneNum, tag, flow,
> duration, count);
> sumed = FOREACH (GROUP raw BY (phoneNum, tag)){
>     totalFlow = SUM(raw.flow);
>     totalDuration = SUM(raw.duration);
>     totalCount = SUM(raw.count);
>     GENERATE flatten(group), TOTUPLE(tutalFlow, totalDuration, totalCount)
> AS condition;
> };
> filtered = FILTER sumed BY com.filter.TagFilter(condition);
> DUMP filtered;
> --------------------------------------------------------
> But I got an error:
> ERROR 1045:
> <file reduce.pig, line 9, column 23> Could not infer the matching function
> for com.filter.TagFilter as multiple or none of them fit. Please use an
> explicit cast.
> Is there anything wrong?
>
> Q2:how to deal with dity datas.
> there are some dity datas in my files, such as;
> $cat data.txt
> 1|2|3|4|5|6
> 2|3|4|5|6|
> 0|2|3|
> 7|7||0||
> The third line is dity data for me. I want to filter it. But no matter
> SIZE(),COUNT() or anything else, I can't filter it.
> Is there any function or method to solve this question?
>