|
|
+
Daniel Eklund 2011-05-17, 15:32
-
Re: Question about immediately projecting on a strsplit() return tuple...Thejas M Nair 2011-05-17, 18:39
Are you using 0.8.1 release ? It has several bug fixes.
The new logical plan was introduced in 0.8 to make it easier to write optimization rules. The error seems to be caused by some bug in the code related to new logical plan. That is why disabling the new logical plan gets it working. Can you try 0.8.1, and if it fails, can you send the entire stack trace from the pig log file. It would be even better if you can open a pig jira ticket. Thanks Thejas On 5/17/11 8:32 AM, "Daniel Eklund" <[EMAIL PROTECTED]> wrote: Hey all, I have one file A with a 'day' column like "2011/3/2" and another B with a column 'timestamp' like "2011/3/2 12:32" ... I want to join on these two field in these records. I do something like this: A_and_B = JOIN A by (tracking_id, day) LEFT OUTER, B by (tracking_id, STRSPLIT(timestamp, ' ', 1).$0) where you can see I am projecting out the first element of the tuple returned by strsplit... When I run this I get an error of the form: org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: HASH_JOIN ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false. Putting the environment variable before the "-x local" I see that the join appears to be working. Yay. I am happy that thing seem to be working, though I would appreciate some feedback from those in the know as to why the environment variable fixes this and if there is a more canonical way of doing this join. thanks, daniel -- +
Daniel Eklund 2011-05-17, 19:20
+
Thejas M Nair 2011-05-17, 20:41
+
Daniel Dai 2011-05-17, 21:17
|