Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Hive producing difference outputs


+
Chen Song 2012-11-15, 16:18
Copy link to this message
-
Re: Hive producing difference outputs
Hi Cheng,
It would really help if you could simplify your example so it's easy
to reproduce by one of us.

Also, can you share the value of hive.optimize.ppd property by doing
hive> set hive.optimize.ppd;

Does inverting the property (making it false, if it's true) change anything?

On Thu, Nov 15, 2012 at 8:18 AM, Chen Song <[EMAIL PROTECTED]> wrote:
> Hi Folks
>
> We are getting inconsistent output when running some semantically same Hive
> queries. We are using CDH3u3 with Hive 0.7.1.
>
> The query I am running performs a map-side join of two subqueries (s1 and
> s2)--s1 is a multi-table join and s2 is a union of a table against itself.
> If we add in mapside join to s1 (which is necessary for performance) the
> query produces different (and incorrect) results.
>
> Basically, s1 returns one row and s2 returns 2 rows. By joining them, it
> should give us two rows. See below for output and queries (I abridged the
> queries a bit for readability).
>
> NOTE: I searched online and found anther thread reporting a similar issue as
> what I have seen,
> http://mail-archives.apache.org/mod_mbox/hive-user/201207.mbox/%3CCADejTpz5rSf-Sxi9HNoCTRHVazOk=F+[EMAIL PROTECTED]%3E.
> Unfortunately, this thread never got answered. It looks like a bug in Hive.
> Greatly appreciate if anyone give thoughts on this issue.
>
>
> Q1 output (incorrect, missing one row):
> 2012-08-15 18:56:38 2258929231696355094 1 10 2258929231696355094 2012-08-15
> 18:56:38 1315 0 US 728 90 0 97525 1 0 NULL 1 0 NULL
>
> Q2 output (correct):
> 2012-08-15 18:56:38 2258929231696355094 1 10 2258929231696355094 2012-08-15
> 18:56:38 1315 0 US 728 90 0 97525 1 0 NULL 10 NULL 0
> 2012-08-15 18:56:38 2258929231696355094 1 10 2258929231696355094 2012-08-15
> 18:56:38 1315 0 US 728 90 0 97525 1 0 NULL 10 NULL 1
>
> Q1 and Q2 only differs in the way that s1 in Q1 has a mapside join.
>
> Q1 (incorrect results, highlighted portion is the only difference from Q2):
>
>
> SELECT /*+MAPJOIN(table) */ *
> FROM
> (
>  SELECT /*+ MAPJOIN(tableA,tableB) */
>   id,
>   ...
>  FROM main_table ctg
>  JOIN sample smp
>    ON ctg.id = smp.id
>  LEFT OUTER JOIN tableA
>    ON smp.publisher_id = tableA.id
>  LEFT OUTER JOIN tableB
>    ON smp.inventory_source_id = tableB.id
>  WHERE ctg.date_time >= '2012-08-15 00:00:00' and
>        ctg.date_time <= '2012-08-15 23:59:59' and
>        ctg.dd = '2012-08-15' and
>        smp.date_time >= '2012-08-15 00:00:00' and
>        smp.date_time <= '2012-08-15 23:59:59' and
>        smp.dd = '2012-08-15'
> )sub
> JOIN
> (
>  select
>    id,
>    parent_category_id,
>    cast(0 as tinyint) as is_parent
>  from table_category
>  where is_system = 1
>  union all
>  select
>    id,
>    parent_category_id,
>    cast(1 as tinyint) as is_parent
>  from table_category
>  where is_system = 1
> )table
> ON table.id = sub.id
>
> Q2 (correct results):
>
> SELECT /*+MAPJOIN(table) */ *
> FROM
> (
>  SELECT
>   id,
>   ...
>  FROM main_table ctg
>  JOIN sample smp
>    ON ctg.id = smp.id
>  LEFT OUTER JOIN tableA
>    ON smp.publisher_id = tableA.id
>  LEFT OUTER JOIN tableB
>    ON smp.inventory_source_id = tableB.id
>  WHERE ctg.date_time >= '2012-08-15 00:00:00' and
>        ctg.date_time <= '2012-08-15 23:59:59' and
>        ctg.dd = '2012-08-15' and
>        smp.date_time >= '2012-08-15 00:00:00' and
>        smp.date_time <= '2012-08-15 23:59:59' and
>        smp.dd = '2012-08-15'
> )sub
> JOIN
> (
>  select
>    id,
>    parent_category_id,
>    cast(0 as tinyint) as is_parent
>  from table_category
>  where is_system = 1
>  union all
>  select
>    id,
>    parent_category_id,
>    cast(1 as tinyint) as is_parent
>  from table_category
>  where is_system = 1
> )table
> ON table.id = sub.id
>
> --
> Chen Song
>
>
+
Chen Song 2012-12-19, 16:55
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB