Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - What is expected from FLATTEN(null tuple)?


Copy link to this message
-
What is expected from FLATTEN(null tuple)?
Raghu Angadi 2011-06-27, 20:49
Looks like FLATTEN(tuple) results in single null when tuple is null,
irrespective of the schema.

As as result, the particular ends up with fewer columns than expected. This
can lead to various kinds of problems.. runtime exceptions, incorrect values
etc.

E.g.
A = load 'x.txt' as (a, t:(b,c), d:);
dump A;
*(1,(2,3),4)*
*(5,,8)*  -- note NULL for 't'.
B = foreach A generate a, FLATTEN(t), d;
dump B;
*(1,2,3,4)*
*(5,,8)*  -- only three fields. results are unpredictable and never correct.

I think the correct output should have been :
(1, 2, 3, 4)
(5,,,8)

It is quite hard for a user to figure this out. PIG know what is expected.
Is there work around for this?

We are thinking of writing a UDF that returns a tuple with NULLs when the
input is null. But it looks like UDFContext does not have context for a pure
UDF (store and load UDFs have). will start another thread about that.

tested with Pig 0.8.1.

Thanks,
Raghu.