Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> UDF FilterFunc and logical OR

Copy link to this message
Re: UDF FilterFunc and logical OR
It's always good to file the bug, if nothing else so people know what land mines are out there instead of spending several days figuring out the problem (like Johannes just had the joy of doing).  

Whether there will be a 0.8.3 is a separate question.  If some committer feels the need for it and is willing to drive it forward then it will happen.  If not, then not.  If some non-committer feels the need for it and is willing to drive it forward I'm sure one of the committers could be convinced to help.


On May 24, 2012, at 9:55 AM, Jonathan Coveney wrote:

> I think that there are a lot of known issues like that in pig 0.8... I
> don't know that anyone is really actively fixing them. Pig 0.8 is now
> pretty ancient and a ton of big stuff changed since then. I'm all about
> "file a bug for everything," but in this case I don't see us rolling out a
> new version of 8 any time soon.
> Can any other committers comment on this?
> 2012/5/24 Johannes Schwenk <[EMAIL PROTECTED]>
>> Ok then. We are trying to use pig 0.10.0 now. We hit some errors in
>> running our tests - but see my new mail for that...
>> Should I file a bug for the found issue - just for completeness?
>> Thanks!
>> Am 23.05.2012 18:20, schrieb Jonathan Coveney:
>>> Thanks for being thorough! It's indeed a bug, but backporting a fix may
>> be
>>> hard. The parser and logical plan changed a lot from .8-.9, so if at all
>>> possible, I would try to use 0.10 (the last release). We use it in
>>> production and it is stable, and has a lot of benefits over .8. I will
>> wan
>>> that the parser changed so if you have many existing jobs, it may be
>> worth
>>> running them on a test cluster with 0.10, but if you don't, defintely
>>> better to make the jump now.
>>> 2012/5/23 Johannes Schwenk <[EMAIL PROTECTED]>
>>>> Hi Jonathan,
>>>> thanks again for your help!
>>>> I have cloned the current git head and created this pig script
>>>> http://pastebin.com/Gc9C9ZPS
>>>> TestCONTAINS-testFilteringCluster-input.txt contains
>>>> http://pastebin.com/h5MC695F
>>>> The adition.jar has been built against the cloudera cdh3u3 distribution
>>>> and contains the filter function CONTAINS
>>>> http://pastebin.com/Uwje7v1V
>>>> Output from running my script with both versions of pig:
>>>> pig 0.11.0-SNAPSHOT
>>>> http://pastebin.com/Cr5CkHui
>>>> => Correct results!!
>>>> pig 0.8.1-cdh3u3
>>>> http://pastebin.com/yXY17mXx
>>>> => Incorrect results!!
>>>> It seems like the new logical plan in pig 0.8.1 optimizes the OR
>>>> operator away. So its a bug, right?
>>>> Am 22.05.2012 21:26, schrieb Jonathan Coveney:
>>>>> If this is a bug, it's an annoying one, so I definitely appreciate your
>>>>> help in getting to the bottom of it. So let's get to the bottom of it
>> :)
>>>>> First, I would clone the trunk version of pig and run the same tests
>>>>> against it and compare. Always good to test any bugs against trunk to
>> see
>>>>> if it is version specific.
>>>>> Right off the bat, I would say that you should dump the files in your
>>>> test
>>>>> to a file, make a short script that does exactly what your test does,
>> and
>>>>> paste the EXPLAIN plan generated for your script (ideally in both your
>>>>> version of pig and trunk). We should be able to see if there is
>> something
>>>>> weird going on.
>>>>> Let me know if you need any help with any of that. If it persists I'll
>>>> try
>>>>> and recreate on my end.
>>>>> 2012/5/22 Johannes Schwenk <[EMAIL PROTECTED]>
>>>>>> Thank you for your quick suggestions!
>>>>>> - I am now using local mode - good point!
>>>>>> - I know of builtin matches, the CONTAINS filter was just to get into
>>>>>> programming UDFS...
>>>>>> - Whatever I do the problem persists. I tried:
>>>>>> * turning off all optimizations (-t All) : no effect
>>>>>> * reordering the statements : the outcome contains still only the