Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive producing difference outputs


Copy link to this message
-
Re: Hive producing difference outputs
resent as no one has replied yet :)

On Thu, Nov 15, 2012 at 11:18 AM, Chen Song <[EMAIL PROTECTED]> wrote:

> Hi Folks
>
> We are getting inconsistent output when running some semantically same
> Hive queries. We are using *CDH3u3* with *Hive 0.7.1.*
>
> The query I am running performs a map-side join of two subqueries (s1 and
> s2)--s1 is a multi-table join and s2 is a union of a table against itself.
> If we add in mapside join to s1 (which is necessary for performance) the
> query produces different (and incorrect) results.
>
> Basically, s1 returns one row and s2 returns 2 rows. By joining them, it
> should give us two rows. See below for output and queries (I abridged the
> queries a bit for readability).
>
> NOTE: I searched online and found anther thread reporting a similar issue
> as what I have seen,
> http://mail-archives.apache.org/mod_mbox/hive-user/201207.mbox/%3CCADejTpz5rSf-Sxi9HNoCTRHVazOk=F+[EMAIL PROTECTED]%3E.
> Unfortunately, this thread never got answered. It looks like a bug in Hive.
> Greatly appreciate if anyone give thoughts on this issue.
>
>
> Q1 output (incorrect, missing one row):
> 2012-08-15 18:56:38 2258929231696355094 1 10 2258929231696355094
> 2012-08-15 18:56:38 1315 0 US 728 90 0 97525 1 0 NULL 1 0 NULL
>
> Q2 output (correct):
> 2012-08-15 18:56:38 2258929231696355094 1 10 2258929231696355094
> 2012-08-15 18:56:38 1315 0 US 728 90 0 97525 1 0 NULL 10 NULL 0
> 2012-08-15 18:56:38 2258929231696355094 1 10 2258929231696355094
> 2012-08-15 18:56:38 1315 0 US 728 90 0 97525 1 0 NULL 10 NULL 1
>
> Q1 and Q2 only differs in the way that s1 in Q1 has a mapside join.
>
>    - Q1 (incorrect results, highlighted portion is the only difference
>    from Q2):
>
>
>  SELECT /*+MAPJOIN(table) */ *
> FROM
> (
>   SELECT /*+ MAPJOIN(tableA,tableB) */
>    id,
>    ...
>   FROM main_table ctg
>   JOIN sample smp
>     ON ctg.id = smp.id
>   LEFT OUTER JOIN tableA
>     ON smp.publisher_id = tableA.id
>   LEFT OUTER JOIN tableB
>     ON smp.inventory_source_id = tableB.id
>   WHERE ctg.date_time >= '2012-08-15 00:00:00' and
>         ctg.date_time <= '2012-08-15 23:59:59' and
>         ctg.dd = '2012-08-15' and
>         smp.date_time >= '2012-08-15 00:00:00' and
>         smp.date_time <= '2012-08-15 23:59:59' and
>         smp.dd = '2012-08-15'
> )sub
> JOIN
>  (
>   select
>     id,
>     parent_category_id,
>     cast(0 as tinyint) as is_parent
>   from table_category
>   where is_system = 1
>   union all
>   select
>     id,
>     parent_category_id,
>     cast(1 as tinyint) as is_parent
>   from table_category
>   where is_system = 1
> )table
>  ON table.id = sub.id
>
>
>    - Q2 (correct results):
>
>  SELECT /*+MAPJOIN(table) */ *
> FROM
> (
>   SELECT
>    id,
>    ...
>   FROM main_table ctg
>   JOIN sample smp
>     ON ctg.id = smp.id
>   LEFT OUTER JOIN tableA
>     ON smp.publisher_id = tableA.id
>   LEFT OUTER JOIN tableB
>     ON smp.inventory_source_id = tableB.id
>   WHERE ctg.date_time >= '2012-08-15 00:00:00' and
>         ctg.date_time <= '2012-08-15 23:59:59' and
>         ctg.dd = '2012-08-15' and
>         smp.date_time >= '2012-08-15 00:00:00' and
>         smp.date_time <= '2012-08-15 23:59:59' and
>         smp.dd = '2012-08-15'
> )sub
> JOIN
>  (
>   select
>     id,
>     parent_category_id,
>     cast(0 as tinyint) as is_parent
>   from table_category
>   where is_system = 1
>   union all
>   select
>     id,
>     parent_category_id,
>     cast(1 as tinyint) as is_parent
>   from table_category
>   where is_system = 1
> )table
>  ON table.id = sub.id
>
> --
> Chen Song
>
>
>
--
Chen Song
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB