Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Question about immediately projecting on a strsplit() return tuple...

Daniel Eklund 2011-05-17, 15:32
Copy link to this message
Re: Question about immediately projecting on a strsplit() return tuple...
Are you using 0.8.1 release ? It has several bug fixes.
The new logical plan was introduced in 0.8 to make it easier to write optimization rules. The error seems to be caused by some bug in the code related to new logical plan.
That is why disabling the new logical plan gets it working.

Can you try 0.8.1, and if it fails, can you send the entire stack trace from the pig log file. It would be even better if you can open a pig jira ticket.

On 5/17/11 8:32 AM, "Daniel Eklund" <[EMAIL PROTECTED]> wrote:

Hey all,

I have one file A with a 'day' column like "2011/3/2"  and another B with a
column 'timestamp' like "2011/3/2 12:32"  ...  I want to join on these two
field in these records.
I do something like this:

A_and_B = JOIN A by (tracking_id, day) LEFT OUTER,
               B by (tracking_id,  STRSPLIT(timestamp, ' ', 1).$0)

where you can see I am projecting out the first element of the tuple
returned by strsplit...

When I run this I get an error of the form:
    org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: HASH_JOIN
    ERROR 2042: Error in new logical plan. Try
Putting the environment variable before the "-x local" I see that the join
appears to be working. Yay.

I am happy that thing seem to be working, though I would appreciate some
feedback from those in the know as to why the environment variable fixes
this and if there is a more canonical way of doing this join.



Daniel Eklund 2011-05-17, 19:20
Thejas M Nair 2011-05-17, 20:41
Daniel Dai 2011-05-17, 21:17