Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Is this a known Bug: Multi Inserts from partitioned source ignore Where Clauses


Copy link to this message
-
Is this a known Bug: Multi Inserts from partitioned source ignore Where Clauses
I ran into an interesting bug. Basically, if your FROM() source is
a partitioned table and you use a where clause that prunes, all of the
INSERT HERE SELECT * WHERE x=y ignores each specified where clause.  This
does not occur if the source partition is not specified, but if the source
as where partition = 'x' then the where on each individual insert is
ignored...

I've included some files here

testdata.tsv - Tab delimited data to prove the issue
create_tables.hive - Creates a database and tables as well as loads the
data from the TSV

Test Cases:
I created these test case files in a way that there are three types of
insert in each case: 1. Load all data from initial statement, 2. Load
partial data (use a limiting clause such as where day >= '2013-01-05', and
3 Load NO data from the initial statement (where 1 = 0)

These tests are all run on hive 0.9

multi-flat-flat.hive - The source table and the dest tables are not
partitioned, the where clauses work as expected:

19 Rows loaded to multi_bug_flat
0 Rows loaded to multi_bug_flat3
15 Rows loaded to multi_bug_flat2

multi-part-part.hive - The source table and the dest tables are
partitioned. The where clauses are not honored.

9 Rows loaded to multi_bug_part3
9 Rows loaded to multi_bug_part2
9 Rows loaded to multi_bug_part

multi-flat-part.hive - The source table is flat, the dest table is
partitioned - The where clauses work as expected:

0 Rows loaded to multi_bug_part3
15 Rows loaded to multi_bug_part2
19 Rows loaded to multi_bug_part

multi-part-flat.hive - The source table is partitioned, the dest table is
flat - The where clauses are not honored:

9 Rows loaded to multi_bug_flat
9 Rows loaded to multi_bug_flat3
9 Rows loaded to multi_bug_flat2

multi-part-specified.hive - The source and dest are partitioned, but there
is no partition pruning statement in the from ()  this works as expected

0 Rows loaded to multi_bug_part3
15 Rows loaded to multi_bug_part2
19 Rows loaded to multi_bug_part
Thoughts?
+
Philip Tromans 2013-01-26, 15:20
+
John Omernik 2013-01-26, 15:27
+
Navis류승우 2013-01-28, 00:53
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB