|
|
-
What is expected from FLATTEN(null tuple)?Raghu Angadi 2011-06-27, 20:49
Looks like FLATTEN(tuple) results in single null when tuple is null,
irrespective of the schema. As as result, the particular ends up with fewer columns than expected. This can lead to various kinds of problems.. runtime exceptions, incorrect values etc. E.g. A = load 'x.txt' as (a, t:(b,c), d:); dump A; *(1,(2,3),4)* *(5,,8)* -- note NULL for 't'. B = foreach A generate a, FLATTEN(t), d; dump B; *(1,2,3,4)* *(5,,8)* -- only three fields. results are unpredictable and never correct. I think the correct output should have been : (1, 2, 3, 4) (5,,,8) It is quite hard for a user to figure this out. PIG know what is expected. Is there work around for this? We are thinking of writing a UDF that returns a tuple with NULLs when the input is null. But it looks like UDFContext does not have context for a pure UDF (store and load UDFs have). will start another thread about that. tested with Pig 0.8.1. Thanks, Raghu. |