Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> UDF FilterFunc and logical OR


+
Johannes Schwenk 2012-05-21, 16:36
+
Jonathan Coveney 2012-05-21, 17:11
Copy link to this message
-
Re: UDF FilterFunc and logical OR
Thank you for your quick suggestions!

- I am now using local mode - good point!
- I know of builtin matches, the CONTAINS filter was just to get into
programming UDFS...
- Whatever I do the problem persists. I tried:
 * turning off all optimizations (-t All) : no effect
 * reordering the statements : the outcome contains still only the
matching tuples to the lhs of the OR
 * using different data (just in case...) : no effect
 * finally counted how many times the exec() function gets called
processing the script... : exactly *six times* - each for every record!

That last observation leads me to believe that this is a bug!? The exec
function should be called at least *ten times* I think.

Du you have any suggestions on how to verify this?

Greetings

Am 21.05.2012 19:11, schrieb Jonathan Coveney:
> Not sure why it is failing... though I will mention two things. 1) you
> should use local mode if possible, especially just to test UDFs :) 2) you
> could use the builtin matches function to achieve this (ie matches
> '.*keyword.*')
>
> Besides that it is odd indeed, and I'd have to dig in more.
>
> 2012/5/21 Johannes Schwenk <[EMAIL PROTECTED]>
>
>> Hello List,
>>
>> I am using Clouderas distribution (cdh3u3) which comes with pig-0.8.1.
>>
>> I have written a UDF extending FilterFunc that checks if the provided
>> string is contained within the specified column of the current tuple:
>> http://pastebin.com/Uwje7v1V
>>
>> I have also written some TestCases:
>> http://pastebin.com/uA4LHB4Q
>>
>> The odd thing is, that only TestCase testFilteringClusterWithOR1 fails
>> because the result has not the expected length of 3 but is of length 2
>> instead (line 177 in http://pastebin.com/Uwje7v1V). After a lot of
>> investigating I still can not find out why testFilteringCluster and
>> testFilteringClusterWithOR2 succeed but not testFilteringClusterWithOR1.
>> Is there a special prerequisite for making my FilterFunc usabel within
>> OR ? Maybe I have missed something very obvious... Please help me figure
>> this out!
>>
>> Greetings,
>> Johannes Schwenk
>>
>> --
>> Softwareentwickler (Reporting)
>> ________________________________________________________
>>
>> ADITION technologies AG
>> Schwarzwaldstraße 78b
>> 79117 Freiburg
>>
>> http://www.adition.com
>>
>> T +49 / (0)761 / 88147 - 30
>> F +49 / (0)761 / 88147 - 77
>> SUPPORT +49  / (0)1805 - ADITION
>>
>> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>>
>> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
>> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
>> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
>> UStIDNr.: DE 218 858 434
>>
>>
>

Johannes Schwenk

--
Softwareentwickler (Reporting)
________________________________________________________

ADITION technologies AG
Schwarzwaldstraße 78b
79117 Freiburg

http://www.adition.com

T +49 / (0)761 / 88147 - 30
F +49 / (0)761 / 88147 - 77
SUPPORT +49  / (0)1805 - ADITION

(Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)

Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
UStIDNr.: DE 218 858 434

+
Jonathan Coveney 2012-05-22, 19:26
+
Johannes Schwenk 2012-05-23, 09:42
+
Jonathan Coveney 2012-05-23, 16:20
+
Johannes Schwenk 2012-05-24, 12:54
+
Jonathan Coveney 2012-05-24, 16:55
+
Alan Gates 2012-05-24, 17:15
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB