Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - FLATTEN disambiguation clause


Copy link to this message
-
RE: FLATTEN disambiguation clause
Santhosh Srinivasan 2009-06-29, 15:56
The disambiguation can be dropped if the column name is unique. A workaround for now is to explicitly name your column names when you flatten.

filtered_scores = FOREACH filtered_scores GENERATE FLATTEN(unified_pair_scores) as (dest_id, pairs_tc, scores_group_overlap, source_id);

The following should work (I have not tried it yet). If Pig is insisting on the disambiguation even when the column name is unique then it's a bug.

filtered_scores = FOREACH filtered_scores GENERATE FLATTEN(unified_pair_scores);
unique_name = FOREACH filtered_scores GENERATE dest_id;

Santhosh

-----Original Message-----
From: Chris Riccomini [mailto:[EMAIL PROTECTED]]
Sent: Monday, June 29, 2009 8:37 AM
To: [EMAIL PROTECTED]
Subject: FLATTEN disambiguation clause

Hi All,

I¹m trying to use flatten for some pig scripts, but FLATTEN is insisting on
using the disambiguation clause even when it doesn¹t need to:

Is there any way to force FLATTEN to NOT use the clause? Why is FLATTEN so
aggressive with this? It¹s a bit irritating, and is causing problems in our
data flow.

Thanks!
Chris

grunt> describe filtered_scores
filtered_scores: {unified_pair_scores: {dest_id: int,pairs_tc:
float,scores_group_overlap: double,source_id: int}}

grunt> filtered_scores = FOREACH filtered_scores GENERATE
FLATTEN(unified_pair_scores);
grunt> describe filtered_scores
filtered_scores: {unified_pair_scores::dest_id:
int,unified_pair_scores::pairs_tc:
float,unified_pair_scores::scores_group_overlap:
double,unified_pair_scores::source_id: int}