Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Multi-group-by select always scans entire table


Copy link to this message
-
Re: Multi-group-by select always scans entire table
Mark Grover 2012-06-06, 01:20
Hi Jan,
The quick answer is I don't know but may be someone else on the mailing
list does:-)

Looking at the wiki page for Lateral view(
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView),
there was a problem related to predicate pushdown on UDTF's (
https://issues.apache.org/jira/browse/HIVE-1056). However, that seemed to
have been fixed in Hive 0.6.0 so it shouldn't have any impact on you.

The fix for above ticket introduced a unit test (at
ql/src/test/results/clientpositive/lateral_view_ppd.q) that tests predicate
pushdown on UDTF's. Now, all the subsequent releases should have had that
test pass (otherwise they wouldn't have been released, I hope). The test
checks for a non-partition column for predicate pushdown. I wonder if it
makes a difference with a partition column being used.

Can you verify if your query with predicate pushdown enabled works as
expected with a non-partition column in the where clause? In that case, the
explain/explain extended output should be different from when predicate
pushdown is disabled. If predicate pushdown works for non-partition columns
but not for partition columns, please create a JIRA stating that predicate
pushdown on UDTF's doesn't work with partition columns.

If it doesn't work for both partition and non-partition columns, then
obviously Hive-1056 is not working for you. We can take it up on the
mailing list from there.

Thanks for your input, Jan.

Mark

On Tue, Jun 5, 2012 at 1:19 AM, Jan Dolinár <[EMAIL PROTECTED]> wrote:

>
>
> On Mon, Jun 4, 2012 at 7:20 PM, Mark Grover <[EMAIL PROTECTED]> wrote:
>
>> Hi Jan,
>> Glad you found something workable.
>>
>> What version of Hive are you using? Could you also please check what the
>> value of the property hive.optimize.ppd is for you?
>>
>> Thanks,
>> Mark
>>
>>
> Hi Mark,
>
> Thanks for reply. I'm using hive 0.7.1 distributed from Cloudera as
> cdh3u4. The property hive.optimize.ppd is set true, but I have tried to
> turn it off and it doesn't effect the behavior of the problematic query at
> all. Any other ideas? :-)
>
> Also could some of you good guys try to check this on hadoop 0.8 or newer?
> It would be nice to know if it is worth to go through all the hassle of
> upgrading or if it won't help. Also, if it is not fixed already, it might
> be good idea to report it as a bug.
>
> Jan
>