Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Multi-group-by select always scans entire table


Copy link to this message
-
Re: Multi-group-by select always scans entire table
Jan Dolinár 2012-06-06, 07:15
Hi Mark,

Thanks for all your help. I tried to run a series of test with various
settings of hive.optimize.ppd and various queries ( see it here
http://pastebin.com/E89p9Ubx ) and now I'm even more confused than
before. In all cases, regardless if the WHERE clause asks about
partitioned or regular column, the result with ppd=true and ppd=false
differ only in file paths, but the structure is the same. Even if I
run the query without the LATERAL VIEW...

Either there is something terribly wrong with hive and/or my setup
and/or I'm completely dumb. Do I understand it right that ppd should
push common criteria from where clauses into earlier stage, so that
there is less data and processing in the following stages? I'm quite
convinced it doesn't really happen here...

Jan

On 6/6/12, Mark Grover <[EMAIL PROTECTED]> wrote:
> Hi Jan,
> The quick answer is I don't know but may be someone else on the mailing
> list does:-)
>
> Looking at the wiki page for Lateral view(
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView),
> there was a problem related to predicate pushdown on UDTF's (
> https://issues.apache.org/jira/browse/HIVE-1056). However, that seemed to
> have been fixed in Hive 0.6.0 so it shouldn't have any impact on you.
>
> The fix for above ticket introduced a unit test (at
> ql/src/test/results/clientpositive/lateral_view_ppd.q) that tests predicate
> pushdown on UDTF's. Now, all the subsequent releases should have had that
> test pass (otherwise they wouldn't have been released, I hope). The test
> checks for a non-partition column for predicate pushdown. I wonder if it
> makes a difference with a partition column being used.
>
> Can you verify if your query with predicate pushdown enabled works as
> expected with a non-partition column in the where clause? In that case, the
> explain/explain extended output should be different from when predicate
> pushdown is disabled. If predicate pushdown works for non-partition columns
> but not for partition columns, please create a JIRA stating that predicate
> pushdown on UDTF's doesn't work with partition columns.
>
> If it doesn't work for both partition and non-partition columns, then
> obviously Hive-1056 is not working for you. We can take it up on the
> mailing list from there.
>
> Thanks for your input, Jan.
>
> Mark
>
> On Tue, Jun 5, 2012 at 1:19 AM, Jan Dolinár <[EMAIL PROTECTED]> wrote:
>
>>
>>
>> On Mon, Jun 4, 2012 at 7:20 PM, Mark Grover <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Jan,
>>> Glad you found something workable.
>>>
>>> What version of Hive are you using? Could you also please check what the
>>> value of the property hive.optimize.ppd is for you?
>>>
>>> Thanks,
>>> Mark
>>>
>>>
>> Hi Mark,
>>
>> Thanks for reply. I'm using hive 0.7.1 distributed from Cloudera as
>> cdh3u4. The property hive.optimize.ppd is set true, but I have tried to
>> turn it off and it doesn't effect the behavior of the problematic query
>> at
>> all. Any other ideas? :-)
>>
>> Also could some of you good guys try to check this on hadoop 0.8 or
>> newer?
>> It would be nice to know if it is worth to go through all the hassle of
>> upgrading or if it won't help. Also, if it is not fixed already, it might
>> be good idea to report it as a bug.
>>
>> Jan
>>
>