Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Wrong output with Multiquery optimizer


Copy link to this message
-
Wrong output with Multiquery optimizer
Vivek Padmanabhan 2013-09-18, 14:21
Hi,
      I have a script which executes multiple jobs , and there is a
considerable amount of multiquery optimization done.
But it looks like the script generates wrong output with multiquery
enabled. The output is fine with -M option.

Attached a trimmed down version of the actual script. The data is
getting messed up in the nested foreach, which is defined inside a macro.
The UDF aaa.RANKING() add a simple rank over the ordered data.
The a sample output that is expected is like below (without multiquery);

/1,3,1,1378339200,9779,http:///www.abc12345.com/JQueryAddUserControl.aspx,68445,3333,6,99999,6,0//
//1,3,2,1378339200,9779,http:///www.abc12345.com/EN/IN/Home.aspx,113961,3333,3,99999,0,0//
//1,3,3,1378339200,9779,http:///images.abc12345.com/Img/Tabs/servicestab_expandshadow.gif,2686,3333,2,99999,0,0//
//1,3,4,1378339200,9779,http:///www.abc12345.com/Images/Rent_a_Car_414x207.jpg,30616,3333,2,99999,0,0//
//1,3,5,1378339200,9779,http:///images.abc12345.com/Img/Tabs/servicestabon_linehide.gif,2203,3333,2,99999,0,0//
//1,3,6,1378339200,9779,http:///images.abc12345.com/Img/Common/dottedlinehr.gif,2108,3333,2,99999,0,0//
//1,3,7,1378339200,9779,http:///www.abc12345.com/WebResource.axd,2688,3333,2,99999,0,0//
//1,3,8,1378339200,9779,http:///www.abc12345.com/Scripts/Button/mouseoverbutton.js,2526,3333,2,99999,0,0/

But with multi query on, the data is received like below ;

/*1,3,52*,1378339200,*9779*,http:///www.abc12345.com/Images/UAE_Visa_Marhaba_Services_382x208_New.jpg,1228,3333,1,99999,0,0//
//1,3,18,1378339200,9779,http:///images.abc12345.com/Img/TooltipYellow/TooltipYellowArrowBottom.png,1695,3333,1,99999,0,0//
//1,3,56,1378339200,9779,http:///www.abc12345.com/App_Themes/Default/Img/Common/arrowblue_right.gif,1226,3333,1,99999,0,0//
//1,3,90,1378339200,9779,http:///www.abc12345.com/Scripts/PNRStatus.js,1205,3333,1,99999,0,0//
//1,3,51,1378339200,9779,http:///images.abc12345.com/Img/Obe/obe_bg.gif,1081,3333,1,99999,0,0//
//*1,3,52*,1378339200,*9779*,http:///static.abc12345.com/Scripts/Obe/Obe.js,1076,3333,1,99999,0,0//
/
Note : the ordering is lost and there are two rows that end up with the
same key. Happens in both  0.11.1 and 0.10.
-t All also did not help.

Would like to understand if I am doing something wrong in the script
that causes this behavior. So far I couldn't figure out a workaround
other than disabling multiquery.

Thanks
Vivek