Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Can't JOIN self?


+
Russell Jurney 2012-07-20, 02:34
+
Russell Jurney 2012-07-20, 02:46
+
Robert Yerex 2012-07-20, 03:00
+
Russell Jurney 2012-07-20, 03:39
+
Bill Graham 2012-07-20, 04:49
+
Russell Jurney 2012-07-20, 05:10
+
Bill Graham 2012-07-20, 05:34
Copy link to this message
-
Re: Can't JOIN self?
Dmitriy Ryaboy 2012-07-20, 07:53
It's kind if a waste of io and mappers. If not a bug, it's an optimization opportunity.

On Jul 19, 2012, at 10:34 PM, Bill Graham <[EMAIL PROTECTED]> wrote:

> No, it isn't a bug as I see it. You need to load the two relations
> separately because a join is across two separate data sources.
>
>
> On Thu, Jul 19, 2012 at 10:10 PM, Russell Jurney
> <[EMAIL PROTECTED]>wrote:
>
>> So it is a bug? Because Pig will not let me self JOIN. I have to LOAD the
>> data twice.
>>
>> On Thu, Jul 19, 2012 at 9:49 PM, Bill Graham <[EMAIL PROTECTED]> wrote:
>>
>>> No, to Pig a self join is just like a regular join across two different
>>> relations. It just happens to be to the same input data.
>>>
>>> On Thu, Jul 19, 2012 at 8:39 PM, Russell Jurney <[EMAIL PROTECTED]
>>>> wrote:
>>>
>>>> Is this a bug?
>>>>
>>>> On Thu, Jul 19, 2012 at 8:00 PM, Robert Yerex <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>> The only way to get it to work is to load a second copy.
>>>>>
>>>>> On Thu, Jul 19, 2012 at 7:46 PM, Russell Jurney <
>>>> [EMAIL PROTECTED]
>>>>>> wrote:
>>>>>
>>>>>> Note: this works if I LOAD a new, 2nd relation and do the join.
>>>>>>
>>>>>> On Thu, Jul 19, 2012 at 7:34 PM, Russell Jurney <
>>>>> [EMAIL PROTECTED]
>>>>>>> wrote:
>>>>>>
>>>>>>> I have a problem where I can't join a relation to itself on a
>>>> different
>>>>>>> field.
>>>>>>>
>>>>>>> describe pairs
>>>>>>> pairs: {from: chararray,to: chararray,message_id:
>>>>> chararray,in_reply_to:
>>>>>>> chararray}
>>>>>>>
>>>>>>> pairs2 = pairs;
>>>>>>>
>>>>>>> with_reply = join pairs by in_reply_to, pairs2 by message_id;
>>>>>>>
>>>>>>>
>>>>>>> I get this error:
>>>>>>>
>>>>>>> 2012-07-19 19:31:16,927 [main] ERROR
>>>> org.apache.pig.tools.grunt.Grunt -
>>>>>>> ERROR 1200: Pig script failed to parse:
>>>>>>> <line 20, column 6> pig script failed to validate:
>>>>>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
>>>>>> Projection
>>>>>>> with nothing to reference!
>>>>>>> 2012-07-19 19:31:16,928 [main] ERROR
>>>> org.apache.pig.tools.grunt.Grunt -
>>>>>>> Failed to parse: Pig script failed to parse:
>>>>>>> <line 20, column 6> pig script failed to validate:
>>>>>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
>>>>>> Projection
>>>>>>> with nothing to reference!
>>>>>>> at
>>>>>>>
>>>>>
>>> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:182)
>>>>>>> at
>>> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1565)
>>>>>>> at
>>> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
>>>>>>> at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
>>>>>>> at
>>>>>>
>>> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
>>>>>>> at
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>>>>>>> at
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
>>>>>>> at
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>>>>> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>>>>>>> at org.apache.pig.Main.run(Main.java:490)
>>>>>>> at org.apache.pig.Main.main(Main.java:111)
>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>> at
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>> at
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>>>>>> Caused by:
>>>>>>> <line 20, column 6> pig script failed to validate:
>>>>>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
>>>>>> Projection
>>>>>>> with nothing to reference!
>>>>>>> at
>>>>>>>
>>>>>>
>
+
Alan Gates 2012-07-20, 16:01
+
Sean Timm 2012-07-23, 21:36
+
Russell Jurney 2012-07-23, 21:48
+
Russell Jurney 2012-07-24, 08:11