Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Using variables generated by FOREACH command

Copy link to this message
Re: Using variables generated by FOREACH command
   Have you checked that you are really getting all the columns you have specified in x? can you tell me what "dump x" is giving you? When you flatten group in x, try doing it like group.page_name as page_name, group.web_session_id as web_session_id then you can do grouped2 = GROUP x by page_name;

On Friday, November 15, 2013 2:17 AM, Mix Nin <[EMAIL PROTECTED]> wrote:

I  have a group and foreach statements as below

grouped = GROUP filterdata BY (page_name,web_session_id);
x = foreach grouped {
distinct_web_cookie_id= DISTINCT filterdata.web_cookie_id;
distinct_encrypted_customer_id= DISTINCT filterdata.encrypted_customer_id;
distinct_web_session_id= DISTINCT filterdata.web_session_id;
distinct_event_time = DISTINCT filterdata.event_time;
distinct_customer_id = DISTINCT filterdata.customer_id;
generate flatten(group), COUNT_STAR(distinct_web_cookie_id) AS
distinct_web_cookie_id,  COUNT_STAR(distinct_encrypted_customer_id) AS
distinct_encrypted_customer_id, COUNT_STAR(distinct_customer_id) AS
distinct_customer_id, COUNT_STAR(distinct_web_session_id) AS
distinct_web_session_id ,COUNT_STAR(filterdata) AS cnt_events;
Now I  want to group on Session_id in x and get the sum of (cnt_events) and
written below commands

grouped2 = GROUP  x BY page_name;
d = foreach grouped2 generate group, COUNT_STAR(cnt_events) tot_events;

When I run "grouped2 = GROUP  x BY page_name;", I get below error:

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
<line 31, column 23> Invalid field projection. Projected field [page_name]
does not exist in schema: event_time:chararray.
When I use describe x, I get output as x: {event_time: chararray}

Not  sure whether schema for foreach statement works? How do I solve this