Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Commands not working properly when stored in pig file


Copy link to this message
-
Re: Commands not working properly when stored in pig file
Thanks for the explanation, Marcos!
On Thu, Mar 28, 2013 at 3:08 AM, MARCOS MEDRADO RUBINELLI <
[EMAIL PROTECTED]> wrote:

>
> Hi, Mix:
> " second map reduce started executing before first one got completed"
> Interesting. Since you just do LOAD for evnt_dtl, without DUMP or STORE it,
> Pig shouldn't do anything, especially before STORE command complete.
>
> I have below script and it works fine. So think root cause is something
> else. Unless your data is very big?
> a = load 'words_and_numbers' as (f1:chararray, f2:chararray);
> b = filter a by f1 is not null;
> store (foreach (group b all) generate flatten($1)) into 'multipleload/tmp';
> c = load 'multipleload/tmp/part-r-00000' as (f3:chararray, f4:chararray);
> dump c;
>
> Johnny
>
>
>
> It's the multi-query execution optimization. Pig doesn't know it should
> wait for the STORE before the second LOAD, so it tries to run it in
> parallel. You have three options:
>
> 1. Name the relation you stored and use it instead of loading a new
> relation:
>
> Data = LOAD '/....' as (,,,, )
> NoNullData= FILTER Data by qe is not null;
> exp = foreach (group NoNullData all) generate flatten($1);
> STORE exp  into 'exp/$inputDatePig';
>
> evnt_dtl = FOREACH exp GENERATE $0 as cust ...
>
> 2. Use the EXEC keyword to tell Pig to finish the commands up to that
> point before running the rest:
>
> Data = LOAD '/....' as (,,,, )
> NoNullData= FILTER Data by qe is not null;
> STORE (foreach (group NoNullData all) generate flatten($1))  into
> 'exp/$inputDatePig';
> EXEC;
> evnt_dtl =LOAD 'exp/$inputDatePig/part-r-00000' AS (cust,,,,,)
>
> 3. Disable multi-query execution:
> $ pig -no_multiquery x.pig
>
>
> - Marcos
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB