Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> UDF FilterFunc and logical OR


Copy link to this message
-
Re: UDF FilterFunc and logical OR
Thank you for your quick suggestions!

- I am now using local mode - good point!
- I know of builtin matches, the CONTAINS filter was just to get into
programming UDFS...
- Whatever I do the problem persists. I tried:
 * turning off all optimizations (-t All) : no effect
 * reordering the statements : the outcome contains still only the
matching tuples to the lhs of the OR
 * using different data (just in case...) : no effect
 * finally counted how many times the exec() function gets called
processing the script... : exactly *six times* - each for every record!

That last observation leads me to believe that this is a bug!? The exec
function should be called at least *ten times* I think.

Du you have any suggestions on how to verify this?

Greetings

Am 21.05.2012 19:11, schrieb Jonathan Coveney:
> Not sure why it is failing... though I will mention two things. 1) you
> should use local mode if possible, especially just to test UDFs :) 2) you
> could use the builtin matches function to achieve this (ie matches
> '.*keyword.*')
>
> Besides that it is odd indeed, and I'd have to dig in more.
>
> 2012/5/21 Johannes Schwenk <[EMAIL PROTECTED]>
>
>> Hello List,
>>
>> I am using Clouderas distribution (cdh3u3) which comes with pig-0.8.1.
>>
>> I have written a UDF extending FilterFunc that checks if the provided
>> string is contained within the specified column of the current tuple:
>> http://pastebin.com/Uwje7v1V
>>
>> I have also written some TestCases:
>> http://pastebin.com/uA4LHB4Q
>>
>> The odd thing is, that only TestCase testFilteringClusterWithOR1 fails
>> because the result has not the expected length of 3 but is of length 2
>> instead (line 177 in http://pastebin.com/Uwje7v1V). After a lot of
>> investigating I still can not find out why testFilteringCluster and
>> testFilteringClusterWithOR2 succeed but not testFilteringClusterWithOR1.
>> Is there a special prerequisite for making my FilterFunc usabel within
>> OR ? Maybe I have missed something very obvious... Please help me figure
>> this out!
>>
>> Greetings,
>> Johannes Schwenk
>>
>> --
>> Softwareentwickler (Reporting)
>> ________________________________________________________
>>
>> ADITION technologies AG
>> Schwarzwaldstraße 78b
>> 79117 Freiburg
>>
>> http://www.adition.com
>>
>> T +49 / (0)761 / 88147 - 30
>> F +49 / (0)761 / 88147 - 77
>> SUPPORT +49  / (0)1805 - ADITION
>>
>> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>>
>> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
>> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
>> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
>> UStIDNr.: DE 218 858 434
>>
>>
>

Johannes Schwenk

--
Softwareentwickler (Reporting)
________________________________________________________

ADITION technologies AG
Schwarzwaldstraße 78b
79117 Freiburg

http://www.adition.com

T +49 / (0)761 / 88147 - 30
F +49 / (0)761 / 88147 - 77
SUPPORT +49  / (0)1805 - ADITION

(Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)

Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
UStIDNr.: DE 218 858 434