Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Commands not working properly when stored in pig file


Copy link to this message
-
Re: Commands not working properly when stored in pig file

Hi, Mix:
" second map reduce started executing before first one got completed"
Interesting. Since you just do LOAD for evnt_dtl, without DUMP or STORE it,
Pig shouldn't do anything, especially before STORE command complete.

I have below script and it works fine. So think root cause is something
else. Unless your data is very big?
a = load 'words_and_numbers' as (f1:chararray, f2:chararray);
b = filter a by f1 is not null;
store (foreach (group b all) generate flatten($1)) into 'multipleload/tmp';
c = load 'multipleload/tmp/part-r-00000' as (f3:chararray, f4:chararray);
dump c;

Johnny

It's the multi-query execution optimization. Pig doesn't know it should wait for the STORE before the second LOAD, so it tries to run it in parallel. You have three options:

1. Name the relation you stored and use it instead of loading a new relation:

Data = LOAD '/....' as (,,,, )
NoNullData= FILTER Data by qe is not null;
exp = foreach (group NoNullData all) generate flatten($1);
STORE exp  into 'exp/$inputDatePig';

evnt_dtl = FOREACH exp GENERATE $0 as cust ...

2. Use the EXEC keyword to tell Pig to finish the commands up to that point before running the rest:

Data = LOAD '/....' as (,,,, )
NoNullData= FILTER Data by qe is not null;
STORE (foreach (group NoNullData all) generate flatten($1))  into
'exp/$inputDatePig';
EXEC;
evnt_dtl =LOAD 'exp/$inputDatePig/part-r-00000' AS (cust,,,,,)

3. Disable multi-query execution:
$ pig -no_multiquery x.pig
- Marcos
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB