Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Cartesian product detection in the query plan?


Copy link to this message
-
Re: Cartesian product detection in the query plan?
David Morel 2013-01-28, 16:45
On 28 Jan 2013, at 14:29, Edward Capriolo wrote:

> Iirc hive.mapred.mode strict should prevent this. If not we should add
> it.

hi Edward,

Yes, that's indeed what the book claims (quoting):

   hive> SELECT * FROM fracture_act JOIN fracture_ads
  > WHERE fracture_act.planner_id = fracture_ads.planner_id;
   FAILED: Error in semantic analysis: In strict mode, cartesian product
is not allowed. If you really want to perform the operation,
   +set hive.mapred.mode=nonstrict+

I am about to re-enable this setting on my cluster (after fixing all the
queries that it broke, especially all the ORDER BY ones :-) but I hoped
it was visible right there in the query plan, or in some other way. If
Hive can detect it, it should be visible somewhere, right?

Thanks!

david

>
> On Monday, January 28, 2013, David Morel <[EMAIL PROTECTED]> wrote:
>> Hi everyone,
>>
>> I had to kill some queries that were taking forever, and it turns out
>> they were doing cartesian products (missing ON clause on a JOIN).
>>
>> I wonder how I could see that in the EXPLAIN output (which I still
>> find
>> a bit cryptic). Specifically, the stage that it was stuck in was
>> this:
>>
>> Stage: Stage-7
>> Map Reduce
>> Alias -> Map Operator Tree:
>>   $INTNAME
>>       Reduce Output Operator
>>         sort order:
>>         tag: 1
>>         value expressions:
>>               expr: _col1
>>               type: int
>>   $INTNAME1
>>       Reduce Output Operator
>>         sort order:
>>         tag: 0
>>         value expressions:
>>               expr: _col0
>>               type: bigint
>>               expr: _col1
>>               type: string
>> Reduce Operator Tree:
>>   Join Operator
>>     condition map:
>>          Inner Join 0 to 1
>>     condition expressions:
>>       0 {VALUE._col0} {VALUE._col1}
>>       1 {VALUE._col1}
>>     handleSkewJoin: false
>>     outputColumnNames: _col0, _col1, _col3
>>     File Output Operator
>>       compressed: true
>>       GlobalTableId: 0
>>       table:
>>           input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
>>           output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>>
>> Is there anything in there that should have alerted me?
>>
>> I found out by looking at the query, but I wonder if the query plan
>> (if
>> I could read it) would have given me that information.
>>
>> Thanks a lot
>>
>> David Morel
>>