Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Review Request 18525: PIG-3679: Fix regression of the STATUS_NULL clean-up


Copy link to this message
-
Re: Review Request 18525: PIG-3679: Fix regression of the STATUS_NULL clean-up
Cheolsoo Park 2014-02-27, 00:28

This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18525/

(Updated Feb. 27, 2014, 12:28 a.m.)
Review request for pig, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
Changes

Incorporate Rohini's comments.
Bugs: PIG-3679
    https://issues.apache.org/jira/browse/PIG-3679
Repository: pig-git
Description

I discovered this regression while debugging the e2e test StreamingPythonUDFs_10 in trunk. To summarize, replacing (STATUS_NULL) with (STATUS_OK + null) has changed how null values are handled in some cases. In particular, some UDFs that used to see no nulls are called with nulls and fail with NPE now. Since this is a major backward incompatibility, I changed POUserFunc to filter out nulls always. Technically, this still changes the behavior with nulls, but it seems ok that UDFs that used to fail with NPE no longer fail.

Here is my reasoning in more details-
https://issues.apache.org/jira/browse/PIG-3679?focusedCommentId=13892966&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13892966

Alternatively, we could let UDFs handle nulls by themselves. That seems cleaner to me, but backward incompatibility is a concern (i.e. "My UDFs used to work with 0.12, but it no longer works with 0.13").
Diffs (updated)

  src/org/apache/pig/PigWarning.java 523cf30
  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java f031b1d

Diff: https://reviews.apache.org/r/18525/diff/
Testing

All e2e tests pass (except Warning_4 PIG-3739).
Thanks,

Cheolsoo Park