Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Is this a known Bug: Multi Inserts from partitioned source ignore Where Clauses


+
John Omernik 2013-01-26, 15:17
+
Philip Tromans 2013-01-26, 15:20
+
John Omernik 2013-01-26, 15:27
Copy link to this message
-
Re: Is this a known Bug: Multi Inserts from partitioned source ignore Where Clauses
For problems with INSERT INTO, there are HIVE-3465 and HIVE-3676.

2013/1/27 John Omernik <[EMAIL PROTECTED]>:
> I am not a code expert, this looks very much like the bug I posted, but my
> bug is not using INSERT OVERWRITE (just INSERT INTO) and I am not doing any
> group by (probably not an issue)
>
> Just to be clear, this is probably the same issue as mine, but if someone
> with more knowledge of the underlying structures were to see the OVERWRITE
> vs INTO they may see something different.
>
>
> On Sat, Jan 26, 2013 at 9:20 AM, Philip Tromans <[EMAIL PROTECTED]>
> wrote:
>>
>> This is a known (recently fixed) bug:
>>
>> https://issues.apache.org/jira/browse/HIVE-3699
>>
>> Phil.
>>
>>
>> On 26 January 2013 15:17, John Omernik <[EMAIL PROTECTED]> wrote:
>>>
>>> I ran into an interesting bug. Basically, if your FROM() source is a
>>> partitioned table and you use a where clause that prunes, all of the INSERT
>>> HERE SELECT * WHERE x=y ignores each specified where clause.  This does not
>>> occur if the source partition is not specified, but if the source as where
>>> partition = 'x' then the where on each individual insert is ignored...
>>>
>>> I've included some files here
>>>
>>> testdata.tsv - Tab delimited data to prove the issue
>>> create_tables.hive - Creates a database and tables as well as loads the
>>> data from the TSV
>>>
>>> Test Cases:
>>> I created these test case files in a way that there are three types of
>>> insert in each case: 1. Load all data from initial statement, 2. Load
>>> partial data (use a limiting clause such as where day >= '2013-01-05', and 3
>>> Load NO data from the initial statement (where 1 = 0)
>>>
>>> These tests are all run on hive 0.9
>>>
>>> multi-flat-flat.hive - The source table and the dest tables are not
>>> partitioned, the where clauses work as expected:
>>>
>>> 19 Rows loaded to multi_bug_flat
>>> 0 Rows loaded to multi_bug_flat3
>>> 15 Rows loaded to multi_bug_flat2
>>>
>>> multi-part-part.hive - The source table and the dest tables are
>>> partitioned. The where clauses are not honored.
>>>
>>> 9 Rows loaded to multi_bug_part3
>>> 9 Rows loaded to multi_bug_part2
>>> 9 Rows loaded to multi_bug_part
>>>
>>> multi-flat-part.hive - The source table is flat, the dest table is
>>> partitioned - The where clauses work as expected:
>>>
>>> 0 Rows loaded to multi_bug_part3
>>> 15 Rows loaded to multi_bug_part2
>>> 19 Rows loaded to multi_bug_part
>>>
>>> multi-part-flat.hive - The source table is partitioned, the dest table is
>>> flat - The where clauses are not honored:
>>>
>>> 9 Rows loaded to multi_bug_flat
>>> 9 Rows loaded to multi_bug_flat3
>>> 9 Rows loaded to multi_bug_flat2
>>>
>>> multi-part-specified.hive - The source and dest are partitioned, but
>>> there is no partition pruning statement in the from ()  this works as
>>> expected
>>>
>>> 0 Rows loaded to multi_bug_part3
>>> 15 Rows loaded to multi_bug_part2
>>> 19 Rows loaded to multi_bug_part
>>>
>>>
>>> Thoughts?
>>
>>
>