Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Is this a known Bug: Multi Inserts from partitioned source ignore Where Clauses


Copy link to this message
-
Re: Is this a known Bug: Multi Inserts from partitioned source ignore Where Clauses
For problems with INSERT INTO, there are HIVE-3465 and HIVE-3676.

2013/1/27 John Omernik <[EMAIL PROTECTED]>:
> I am not a code expert, this looks very much like the bug I posted, but my
> bug is not using INSERT OVERWRITE (just INSERT INTO) and I am not doing any
> group by (probably not an issue)
>
> Just to be clear, this is probably the same issue as mine, but if someone
> with more knowledge of the underlying structures were to see the OVERWRITE
> vs INTO they may see something different.
>
>
> On Sat, Jan 26, 2013 at 9:20 AM, Philip Tromans <[EMAIL PROTECTED]>
> wrote:
>>
>> This is a known (recently fixed) bug:
>>
>> https://issues.apache.org/jira/browse/HIVE-3699
>>
>> Phil.
>>
>>
>> On 26 January 2013 15:17, John Omernik <[EMAIL PROTECTED]> wrote:
>>>
>>> I ran into an interesting bug. Basically, if your FROM() source is a
>>> partitioned table and you use a where clause that prunes, all of the INSERT
>>> HERE SELECT * WHERE x=y ignores each specified where clause.  This does not
>>> occur if the source partition is not specified, but if the source as where
>>> partition = 'x' then the where on each individual insert is ignored...
>>>
>>> I've included some files here
>>>
>>> testdata.tsv - Tab delimited data to prove the issue
>>> create_tables.hive - Creates a database and tables as well as loads the
>>> data from the TSV
>>>
>>> Test Cases:
>>> I created these test case files in a way that there are three types of
>>> insert in each case: 1. Load all data from initial statement, 2. Load
>>> partial data (use a limiting clause such as where day >= '2013-01-05', and 3
>>> Load NO data from the initial statement (where 1 = 0)
>>>
>>> These tests are all run on hive 0.9
>>>
>>> multi-flat-flat.hive - The source table and the dest tables are not
>>> partitioned, the where clauses work as expected:
>>>
>>> 19 Rows loaded to multi_bug_flat
>>> 0 Rows loaded to multi_bug_flat3
>>> 15 Rows loaded to multi_bug_flat2
>>>
>>> multi-part-part.hive - The source table and the dest tables are
>>> partitioned. The where clauses are not honored.
>>>
>>> 9 Rows loaded to multi_bug_part3
>>> 9 Rows loaded to multi_bug_part2
>>> 9 Rows loaded to multi_bug_part
>>>
>>> multi-flat-part.hive - The source table is flat, the dest table is
>>> partitioned - The where clauses work as expected:
>>>
>>> 0 Rows loaded to multi_bug_part3
>>> 15 Rows loaded to multi_bug_part2
>>> 19 Rows loaded to multi_bug_part
>>>
>>> multi-part-flat.hive - The source table is partitioned, the dest table is
>>> flat - The where clauses are not honored:
>>>
>>> 9 Rows loaded to multi_bug_flat
>>> 9 Rows loaded to multi_bug_flat3
>>> 9 Rows loaded to multi_bug_flat2
>>>
>>> multi-part-specified.hive - The source and dest are partitioned, but
>>> there is no partition pruning statement in the from ()  this works as
>>> expected
>>>
>>> 0 Rows loaded to multi_bug_part3
>>> 15 Rows loaded to multi_bug_part2
>>> 19 Rows loaded to multi_bug_part
>>>
>>>
>>> Thoughts?
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB