-Re: Review Request 18525: PIG-3679: Fix regression of the STATUS_NULL clean-up
Cheolsoo Park 2014-02-27, 00:28
This is an automatically generated e-mail. To reply, visit:
(Updated Feb. 27, 2014, 12:28 a.m.)
Review request for pig, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
Incorporate Rohini's comments.
I discovered this regression while debugging the e2e test StreamingPythonUDFs_10 in trunk. To summarize, replacing (STATUS_NULL) with (STATUS_OK + null) has changed how null values are handled in some cases. In particular, some UDFs that used to see no nulls are called with nulls and fail with NPE now. Since this is a major backward incompatibility, I changed POUserFunc to filter out nulls always. Technically, this still changes the behavior with nulls, but it seems ok that UDFs that used to fail with NPE no longer fail.
Here is my reasoning in more details-
Alternatively, we could let UDFs handle nulls by themselves. That seems cleaner to me, but backward incompatibility is a concern (i.e. "My UDFs used to work with 0.12, but it no longer works with 0.13").
All e2e tests pass (except Warning_4 PIG-3739).