Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Multi-group-by select always scans entire table


Copy link to this message
-
Re: Multi-group-by select always scans entire table
Hi Jan,
Thanks for the analysis.
Yes, it's true that optimize ppd will push predicates to be evaluated
earlier. The only catch there is that predicates cannot be pushed across
constructs that change the data in the query. An example of this is having
a predicate (say of the form 'where Col is not NULL') on the right table in
left outer join query. Such a predicate will lead to different results when
executed (on the right table) before or after the left join. Therefore,
predicate push down wouldn't make sense in such a case. However, if you
have an inner join on the same tables, predicates can be pushed down since
executing the predicate before or after the inner join leads to the same
results.

Can you please check if predicate push down enabled changes the explain
plan on a simple inner join query like:

select a.* from a inner join b on(a.key=b.key) where a.some_col=blah;

Thank you for your input, Jan!

Mark
On Jun 6, 2012 3:15 AM, "Jan Dolinár" <[EMAIL PROTECTED]> wrote:

> Hi Mark,
>
> Thanks for all your help. I tried to run a series of test with various
> settings of hive.optimize.ppd and various queries ( see it here
> http://pastebin.com/E89p9Ubx ) and now I'm even more confused than
> before. In all cases, regardless if the WHERE clause asks about
> partitioned or regular column, the result with ppd=true and ppd=false
> differ only in file paths, but the structure is the same. Even if I
> run the query without the LATERAL VIEW...
>
> Either there is something terribly wrong with hive and/or my setup
> and/or I'm completely dumb. Do I understand it right that ppd should
> push common criteria from where clauses into earlier stage, so that
> there is less data and processing in the following stages? I'm quite
> convinced it doesn't really happen here...
>
> Jan
>
> On 6/6/12, Mark Grover <[EMAIL PROTECTED]> wrote:
> > Hi Jan,
> > The quick answer is I don't know but may be someone else on the mailing
> > list does:-)
> >
> > Looking at the wiki page for Lateral view(
> >
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView
> ),
> > there was a problem related to predicate pushdown on UDTF's (
> > https://issues.apache.org/jira/browse/HIVE-1056). However, that seemed
> to
> > have been fixed in Hive 0.6.0 so it shouldn't have any impact on you.
> >
> > The fix for above ticket introduced a unit test (at
> > ql/src/test/results/clientpositive/lateral_view_ppd.q) that tests
> predicate
> > pushdown on UDTF's. Now, all the subsequent releases should have had that
> > test pass (otherwise they wouldn't have been released, I hope). The test
> > checks for a non-partition column for predicate pushdown. I wonder if it
> > makes a difference with a partition column being used.
> >
> > Can you verify if your query with predicate pushdown enabled works as
> > expected with a non-partition column in the where clause? In that case,
> the
> > explain/explain extended output should be different from when predicate
> > pushdown is disabled. If predicate pushdown works for non-partition
> columns
> > but not for partition columns, please create a JIRA stating that
> predicate
> > pushdown on UDTF's doesn't work with partition columns.
> >
> > If it doesn't work for both partition and non-partition columns, then
> > obviously Hive-1056 is not working for you. We can take it up on the
> > mailing list from there.
> >
> > Thanks for your input, Jan.
> >
> > Mark
> >
> > On Tue, Jun 5, 2012 at 1:19 AM, Jan Dolinár <[EMAIL PROTECTED]> wrote:
> >
> >>
> >>
> >> On Mon, Jun 4, 2012 at 7:20 PM, Mark Grover <[EMAIL PROTECTED]> wrote:
> >>
> >>> Hi Jan,
> >>> Glad you found something workable.
> >>>
> >>> What version of Hive are you using? Could you also please check what
> the
> >>> value of the property hive.optimize.ppd is for you?
> >>>
> >>> Thanks,
> >>> Mark
> >>>
> >>>
> >> Hi Mark,
> >>
> >> Thanks for reply. I'm using hive 0.7.1 distributed from Cloudera as
> >> cdh3u4. The property hive.optimize.ppd is set true, but I have tried to