Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Cartesian product detection in the query plan?


Copy link to this message
-
Re: Cartesian product detection in the query plan?
Dean Wampler 2013-01-28, 17:05
By the way, v0.10.0 adds a new CROSS JOIN feature, when you really intended
to do one ;)

SELECT a.x, b.y FROM tablea a CROSS JOIN tableb b;

On Mon, Jan 28, 2013 at 10:58 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote:

> Currently order by is very inefficient in hive. Hopefully you can use sort
> by in most cases.
>
> It should be "visible" in the plan. The plan should be very different if
> you are using the ON clause vs not.  Can it be easily detected is another
> question.
>
>
> On Mon, Jan 28, 2013 at 11:45 AM, David Morel <[EMAIL PROTECTED]> wrote:
>
>> On 28 Jan 2013, at 14:29, Edward Capriolo wrote:
>>
>>  Iirc hive.mapred.mode strict should prevent this. If not we should add
>>> it.
>>>
>>
>> hi Edward,
>>
>> Yes, that's indeed what the book claims (quoting):
>>
>>   hive> SELECT * FROM fracture_act JOIN fracture_ads
>>  > WHERE fracture_act.planner_id = fracture_ads.planner_id;
>>   FAILED: Error in semantic analysis: In strict mode, cartesian product
>> is not allowed. If you really want to perform the operation,
>>   +set hive.mapred.mode=nonstrict+
>>
>> I am about to re-enable this setting on my cluster (after fixing all the
>> queries that it broke, especially all the ORDER BY ones :-) but I hoped
>> it was visible right there in the query plan, or in some other way. If
>> Hive can detect it, it should be visible somewhere, right?
>>
>> Thanks!
>>
>> david
>>
>>
>>> On Monday, January 28, 2013, David Morel <[EMAIL PROTECTED]> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> I had to kill some queries that were taking forever, and it turns out
>>>> they were doing cartesian products (missing ON clause on a JOIN).
>>>>
>>>> I wonder how I could see that in the EXPLAIN output (which I still find
>>>> a bit cryptic). Specifically, the stage that it was stuck in was this:
>>>>
>>>> Stage: Stage-7
>>>> Map Reduce
>>>> Alias -> Map Operator Tree:
>>>>   $INTNAME
>>>>       Reduce Output Operator
>>>>         sort order:
>>>>         tag: 1
>>>>         value expressions:
>>>>               expr: _col1
>>>>               type: int
>>>>   $INTNAME1
>>>>       Reduce Output Operator
>>>>         sort order:
>>>>         tag: 0
>>>>         value expressions:
>>>>               expr: _col0
>>>>               type: bigint
>>>>               expr: _col1
>>>>               type: string
>>>> Reduce Operator Tree:
>>>>   Join Operator
>>>>     condition map:
>>>>          Inner Join 0 to 1
>>>>     condition expressions:
>>>>       0 {VALUE._col0} {VALUE._col1}
>>>>       1 {VALUE._col1}
>>>>     handleSkewJoin: false
>>>>     outputColumnNames: _col0, _col1, _col3
>>>>     File Output Operator
>>>>       compressed: true
>>>>       GlobalTableId: 0
>>>>       table:
>>>>           input format:
>>>>
>>> org.apache.hadoop.mapred.SequenceFileInputFormat
>>>
>>>>           output format:
>>>>
>>> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>>>
>>>>
>>>> Is there anything in there that should have alerted me?
>>>>
>>>> I found out by looking at the query, but I wonder if the query plan (if
>>>> I could read it) would have given me that information.
>>>>
>>>> Thanks a lot
>>>>
>>>> David Morel
>>>>
>>>>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330