Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> FLATTEN disambiguation clause


Copy link to this message
-
RE: FLATTEN disambiguation clause
The disambiguation can be dropped if the column name is unique. A workaround for now is to explicitly name your column names when you flatten.

filtered_scores = FOREACH filtered_scores GENERATE FLATTEN(unified_pair_scores) as (dest_id, pairs_tc, scores_group_overlap, source_id);

The following should work (I have not tried it yet). If Pig is insisting on the disambiguation even when the column name is unique then it's a bug.

filtered_scores = FOREACH filtered_scores GENERATE FLATTEN(unified_pair_scores);
unique_name = FOREACH filtered_scores GENERATE dest_id;

Santhosh

-----Original Message-----
From: Chris Riccomini [mailto:[EMAIL PROTECTED]]
Sent: Monday, June 29, 2009 8:37 AM
To: [EMAIL PROTECTED]
Subject: FLATTEN disambiguation clause

Hi All,

I¹m trying to use flatten for some pig scripts, but FLATTEN is insisting on
using the disambiguation clause even when it doesn¹t need to:

Is there any way to force FLATTEN to NOT use the clause? Why is FLATTEN so
aggressive with this? It¹s a bit irritating, and is causing problems in our
data flow.

Thanks!
Chris

grunt> describe filtered_scores
filtered_scores: {unified_pair_scores: {dest_id: int,pairs_tc:
float,scores_group_overlap: double,source_id: int}}

grunt> filtered_scores = FOREACH filtered_scores GENERATE
FLATTEN(unified_pair_scores);
grunt> describe filtered_scores
filtered_scores: {unified_pair_scores::dest_id:
int,unified_pair_scores::pairs_tc:
float,unified_pair_scores::scores_group_overlap:
double,unified_pair_scores::source_id: int}
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB