Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Using variables generated by FOREACH command


Copy link to this message
-
Using variables generated by FOREACH command
Mix Nin 2013-11-14, 21:17
Hi

I  have a group and foreach statements as below

grouped = GROUP filterdata BY (page_name,web_session_id);
x = foreach grouped {
distinct_web_cookie_id= DISTINCT filterdata.web_cookie_id;
distinct_encrypted_customer_id= DISTINCT filterdata.encrypted_customer_id;
distinct_web_session_id= DISTINCT filterdata.web_session_id;
distinct_event_time = DISTINCT filterdata.event_time;
distinct_customer_id = DISTINCT filterdata.customer_id;
generate flatten(group), COUNT_STAR(distinct_web_cookie_id) AS
distinct_web_cookie_id,  COUNT_STAR(distinct_encrypted_customer_id) AS
distinct_encrypted_customer_id, COUNT_STAR(distinct_customer_id) AS
distinct_customer_id, COUNT_STAR(distinct_web_session_id) AS
distinct_web_session_id ,COUNT_STAR(filterdata) AS cnt_events;
};
Now I  want to group on Session_id in x and get the sum of (cnt_events) and
written below commands

grouped2 = GROUP  x BY page_name;
d = foreach grouped2 generate group, COUNT_STAR(cnt_events) tot_events;

When I run "grouped2 = GROUP  x BY page_name;", I get below error:

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
<line 31, column 23> Invalid field projection. Projected field [page_name]
does not exist in schema: event_time:chararray.
When I use describe x, I get output as x: {event_time: chararray}

Not  sure whether schema for foreach statement works? How do I solve this
problem.

Thanks