Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Commands not working properly when stored in pig file


+
Mix Nin 2013-03-27, 21:58
+
Johnny Zhang 2013-03-27, 22:16
+
Mix Nin 2013-03-27, 22:45
+
Mix Nin 2013-03-27, 23:07
+
Johnny Zhang 2013-03-28, 03:15
+
MARCOS MEDRADO RUBINELLI 2013-03-28, 10:08
Copy link to this message
-
Re: Commands not working properly when stored in pig file
Johnny Zhang 2013-03-28, 16:19
Thanks for the explanation, Marcos!
On Thu, Mar 28, 2013 at 3:08 AM, MARCOS MEDRADO RUBINELLI <
[EMAIL PROTECTED]> wrote:

>
> Hi, Mix:
> " second map reduce started executing before first one got completed"
> Interesting. Since you just do LOAD for evnt_dtl, without DUMP or STORE it,
> Pig shouldn't do anything, especially before STORE command complete.
>
> I have below script and it works fine. So think root cause is something
> else. Unless your data is very big?
> a = load 'words_and_numbers' as (f1:chararray, f2:chararray);
> b = filter a by f1 is not null;
> store (foreach (group b all) generate flatten($1)) into 'multipleload/tmp';
> c = load 'multipleload/tmp/part-r-00000' as (f3:chararray, f4:chararray);
> dump c;
>
> Johnny
>
>
>
> It's the multi-query execution optimization. Pig doesn't know it should
> wait for the STORE before the second LOAD, so it tries to run it in
> parallel. You have three options:
>
> 1. Name the relation you stored and use it instead of loading a new
> relation:
>
> Data = LOAD '/....' as (,,,, )
> NoNullData= FILTER Data by qe is not null;
> exp = foreach (group NoNullData all) generate flatten($1);
> STORE exp  into 'exp/$inputDatePig';
>
> evnt_dtl = FOREACH exp GENERATE $0 as cust ...
>
> 2. Use the EXEC keyword to tell Pig to finish the commands up to that
> point before running the rest:
>
> Data = LOAD '/....' as (,,,, )
> NoNullData= FILTER Data by qe is not null;
> STORE (foreach (group NoNullData all) generate flatten($1))  into
> 'exp/$inputDatePig';
> EXEC;
> evnt_dtl =LOAD 'exp/$inputDatePig/part-r-00000' AS (cust,,,,,)
>
> 3. Disable multi-query execution:
> $ pig -no_multiquery x.pig
>
>
> - Marcos
>