Pig, mail # user - FLATTEN disambiguation clause

RE: FLATTEN disambiguation clause
Santhosh Srinivasan 2009-06-29, 15:56
The disambiguation can be dropped if the column name is unique. A workaround for now is to explicitly name your column names when you flatten.

filtered_scores = FOREACH filtered_scores GENERATE FLATTEN(unified_pair_scores) as (dest_id, pairs_tc, scores_group_overlap, source_id);

The following should work (I have not tried it yet). If Pig is insisting on the disambiguation even when the column name is unique then it's a bug.

filtered_scores = FOREACH filtered_scores GENERATE FLATTEN(unified_pair_scores);
unique_name = FOREACH filtered_scores GENERATE dest_id;


Hi All,

I¹m trying to use flatten for some pig scripts, but FLATTEN is insisting on
using the disambiguation clause even when it doesn¹t need to:

Is there any way to force FLATTEN to NOT use the clause? Why is FLATTEN so
aggressive with this? It¹s a bit irritating, and is causing problems in our
data flow.


grunt> describe filtered_scores
filtered_scores: {unified_pair_scores: {dest_id: int,pairs_tc:
float,scores_group_overlap: double,source_id: int}}

grunt> filtered_scores = FOREACH filtered_scores GENERATE
grunt> describe filtered_scores
filtered_scores: {unified_pair_scores::dest_id:
double,unified_pair_scores::source_id: int}