Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Multi-group-by select always scans entire table


Copy link to this message
-
Re: Multi-group-by select always scans entire table
Hi Jan,
The quick answer is I don't know but may be someone else on the mailing
list does:-)

Looking at the wiki page for Lateral view(
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView),
there was a problem related to predicate pushdown on UDTF's (
https://issues.apache.org/jira/browse/HIVE-1056). However, that seemed to
have been fixed in Hive 0.6.0 so it shouldn't have any impact on you.

The fix for above ticket introduced a unit test (at
ql/src/test/results/clientpositive/lateral_view_ppd.q) that tests predicate
pushdown on UDTF's. Now, all the subsequent releases should have had that
test pass (otherwise they wouldn't have been released, I hope). The test
checks for a non-partition column for predicate pushdown. I wonder if it
makes a difference with a partition column being used.

Can you verify if your query with predicate pushdown enabled works as
expected with a non-partition column in the where clause? In that case, the
explain/explain extended output should be different from when predicate
pushdown is disabled. If predicate pushdown works for non-partition columns
but not for partition columns, please create a JIRA stating that predicate
pushdown on UDTF's doesn't work with partition columns.

If it doesn't work for both partition and non-partition columns, then
obviously Hive-1056 is not working for you. We can take it up on the
mailing list from there.

Thanks for your input, Jan.

Mark

On Tue, Jun 5, 2012 at 1:19 AM, Jan Dolinár <[EMAIL PROTECTED]> wrote:

>
>
> On Mon, Jun 4, 2012 at 7:20 PM, Mark Grover <[EMAIL PROTECTED]> wrote:
>
>> Hi Jan,
>> Glad you found something workable.
>>
>> What version of Hive are you using? Could you also please check what the
>> value of the property hive.optimize.ppd is for you?
>>
>> Thanks,
>> Mark
>>
>>
> Hi Mark,
>
> Thanks for reply. I'm using hive 0.7.1 distributed from Cloudera as
> cdh3u4. The property hive.optimize.ppd is set true, but I have tried to
> turn it off and it doesn't effect the behavior of the problematic query at
> all. Any other ideas? :-)
>
> Also could some of you good guys try to check this on hadoop 0.8 or newer?
> It would be nice to know if it is worth to go through all the hassle of
> upgrading or if it won't help. Also, if it is not fixed already, it might
> be good idea to report it as a bug.
>
> Jan
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB