Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Cartesian product detection in the query plan?


Copy link to this message
-
Cartesian product detection in the query plan?
Hi everyone,

I had to kill some queries that were taking forever, and it turns out
they were doing cartesian products (missing ON clause on a JOIN).

I wonder how I could see that in the EXPLAIN output (which I still find
a bit cryptic). Specifically, the stage that it was stuck in was this:

   Stage: Stage-7
     Map Reduce
       Alias -> Map Operator Tree:
         $INTNAME
             Reduce Output Operator
               sort order:
               tag: 1
               value expressions:
                     expr: _col1
                     type: int
         $INTNAME1
             Reduce Output Operator
               sort order:
               tag: 0
               value expressions:
                     expr: _col0
                     type: bigint
                     expr: _col1
                     type: string
       Reduce Operator Tree:
         Join Operator
           condition map:
                Inner Join 0 to 1
           condition expressions:
             0 {VALUE._col0} {VALUE._col1}
             1 {VALUE._col1}
           handleSkewJoin: false
           outputColumnNames: _col0, _col1, _col3
           File Output Operator
             compressed: true
             GlobalTableId: 0
             table:
                 input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
                 output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

Is there anything in there that should have alerted me?

I found out by looking at the query, but I wonder if the query plan (if
I could read it) would have given me that information.

Thanks a lot

David Morel
+
Edward Capriolo 2013-01-28, 13:29
+
David Morel 2013-01-28, 16:45
+
Edward Capriolo 2013-01-28, 16:58
+
David Morel 2013-01-28, 17:16
+
Dean Wampler 2013-01-28, 17:05
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB