Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Using variables generated by FOREACH command


Copy link to this message
-
Re: Using variables generated by FOREACH command
Adam Kawa 2013-11-27, 23:39
afaik, you can also do

".... generate flatten(group) as (page_name,web_session_id), ...."
2013/11/15 <[EMAIL PROTECTED]>

> Hey,
>    Have you checked that you are really getting all the columns you have
> specified in x? can you tell me what "dump x" is giving you? When you
> flatten group in x, try doing it like group.page_name as page_name,
> group.web_session_id as web_session_id then you can do grouped2 = GROUP x
> by page_name;
>
>
>
> On Friday, November 15, 2013 2:17 AM, Mix Nin <[EMAIL PROTECTED]> wrote:
>
> Hi
>
> I  have a group and foreach statements as below
>
> grouped = GROUP filterdata BY (page_name,web_session_id);
> x = foreach grouped {
> distinct_web_cookie_id= DISTINCT filterdata.web_cookie_id;
> distinct_encrypted_customer_id= DISTINCT filterdata.encrypted_customer_id;
> distinct_web_session_id= DISTINCT filterdata.web_session_id;
> distinct_event_time = DISTINCT filterdata.event_time;
> distinct_customer_id = DISTINCT filterdata.customer_id;
> generate flatten(group), COUNT_STAR(distinct_web_cookie_id) AS
> distinct_web_cookie_id,  COUNT_STAR(distinct_encrypted_customer_id) AS
> distinct_encrypted_customer_id, COUNT_STAR(distinct_customer_id) AS
> distinct_customer_id, COUNT_STAR(distinct_web_session_id) AS
> distinct_web_session_id ,COUNT_STAR(filterdata) AS cnt_events;
> };
>
>
> Now I  want to group on Session_id in x and get the sum of (cnt_events) and
> written below commands
>
> grouped2 = GROUP  x BY page_name;
> d = foreach grouped2 generate group, COUNT_STAR(cnt_events) tot_events;
>
> When I run "grouped2 = GROUP  x BY page_name;", I get below error:
>
> [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
> <line 31, column 23> Invalid field projection. Projected field [page_name]
> does not exist in schema: event_time:chararray.
>
>
> When I use describe x, I get output as x: {event_time: chararray}
>
> Not  sure whether schema for foreach statement works? How do I solve this
> problem.
>
> Thanks
>