Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Problem with using CROSS in PIG


Copy link to this message
-
Re: Problem with using CROSS in PIG
Looks like a bug.

On Aug 2, 2013, at 1:51 AM, Simonffy Szilvia <[EMAIL PROTECTED]> wrote:

> Yes, I read your problem with cross.
> But for me doesn't goes away, if I use more reducers in cross. (I don't use join!)
>
> Changed:
>
> D = CROSS C, sequence_number parallel 8;
>
> Execution results after five times running:
> 1. Successfully stored 1 records
> 2. Successfully stored 2 records
> 3. Successfully stored 1 records
> 4. Successfully stored 2 records
> 5. Successfully stored 2 records
>
> But, If I put some store statements between each action to debug, then the result was every time correct.
> A = LOAD ...;
> B = FILTER A...;
> C = FILTER B...;
> STORE C INTO '/tmp/data/tmp/step1' using PigStorage();
>
> sequence_numbers = LOAD ...;
> sequence_number = FILTER sequence_numbers ...;
> sequence_number = FOREACH sequence_number GENERATE...;
> sequence_number = LIMIT sequence_number 1;
> STORE sequence_number INTO '/tmp/data/tmp/step1.1' using PigStorage();
>
> D = CROSS C, sequence_number;
> STORE D INTO '/tmp/data/tmp/step1.2' using PigStorage();
> E = FOREACH D GENERATE...;
>
> STORE E INTO '/tmp/data/tmp/step2' using PigStorage();
>
> Execution results after five times running:
> 1. Successfully stored 6 records
> 2. Successfully stored 6 records
> 3. Successfully stored 6 records
> 4. Successfully stored 6 records
> 5. Successfully stored 6 records
>
> br,
> Szilvi
>> I had the same problem. You can search the mailing list to find out more about it. But, in a nut shell, this happens only when pig calculated the number of reducers it needs. It will go away if you specify the number of reducers in the join step. Try it and tell us if that works.
>>
>>
>> ________________________________
>>  From: Simonffy Szilvia <[EMAIL PROTECTED]>
>> To: [EMAIL PROTECTED]
>> Sent: Thursday, August 1, 2013 11:31 PM
>> Subject: Fwd: Problem with using CROSS in PIG
>>  
>> Hi,
>>
>> I wrote a pig script, and I got not consequent result when running more times the same script.
>>
>> pig version: pig: 0.11.1
>> hadoop version: 1.1.2 / 4 node
>>
>> pig script:
>> A = LOAD '/tmp/data' AS (request_datetime: chararray, portal_name: chararray, sku: chararray, product_name: chararray, duration: int);
>> B = FILTER A BY portal_name == 'portal1';
>> C = FILTER B BY sku == '4505865';
>>
>> sequence_numbers = LOAD 'sequence_numbers' USING org.apache.hcatalog.pig.HCatLoader();
>> sequence_number = FILTER sequence_numbers BY key == '20071224_20071230';
>> sequence_number = FOREACH sequence_number GENERATE
>>     seq AS seq;
>> sequence_number = LIMIT sequence_number 1;
>>
>> D = CROSS C, sequence_number;
>> E = FOREACH D GENERATE
>>     request_datetime AS request_datetime,
>>     portal_name AS portal_name,
>>     sku AS sku,
>>     product_name AS product_name,
>>     duration AS duration,
>>     seq AS seq;
>>
>> STORE E INTO '/tmp/data/output/' using PigStorage();
>>
>> Execution results after five times running:
>> 1. Successfully stored 3 records
>> 2. Successfully stored 5 records
>> 3. Successfully stored 2 records
>> 4. Successfully stored 3 records
>> 5. Successfully stored 1 records
>>
>> Can anybody tell me what is wrong?
>>
>> ps.: I made a workaround for skip CROSS, and use join instead of cross.
>> D JOIN C BY identifier, report_sequence_number BY identifier; //where identifier is a constant number:1
>> With this changes the result is correct every time.
>>
>> data: /tmp/data/data.tsv
>> 2013-03-14T10:07:14    portal1    4505865    Julsång (Cantique de Noël) (1997 Digital Remaster)    304
>> 2013-03-14T22:55:49    portal1    4505865    Julsång (Cantique de Noël) (1997 Digital Remaster)    304
>> 2013-03-19T09:11:03    portal1    4505865    Julsång (Cantique de Noël) (1997 Digital Remaster)    304
>> 2013-03-19T09:23:49    portal1    4505865    Julsång (Cantique de Noël) (1997 Digital Remaster)    304
>> 2013-03-19T09:23:49    portal1    4505865    Julsång (Cantique de Noël) (1997 Digital Remaster)    304
>> 2013-03-17T13:36:15    portal1    4505865    Julsång (Cantique de Noël) (1997 Digital Remaster)    304