Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Override COUNT() function


Copy link to this message
-
Re: Override COUNT() function
Navis류승우 2013-07-02, 07:50
MetadataOnlyOptimizer changes GBY on partition columns to simple
TableScan with one line dummy.

I think similar things can be done with stats.

2013/6/28 Peter Marron <[EMAIL PROTECTED]>:
> Hi,
>
>
>
> I feel sure that someone has asked for this before, but here goes…
>
>
>
> In the case where I have the query
>
>
>
>                 SELECT COUNT(*) FROM table;
>
>
>
> There are many cases where I can determine the count immediately.
>
> (For example if I have run something like:
>
>
>
> ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)]
> COMPUTE STATISTICS [noscan];
>
>
>
> then there seems to be a table property “numRows” which holds a count of the
> number of rows.
>
> Now I know that the COUNT function can’t always be determined easily.
>
> If the query is more complicated, like
>
>
>
>                 SELECT COUNT(*) FROM table GROUP BY column;
>
>
>
> then obviously a simple scalar count is of no real use. But is there some
> way
>
> to intercept the simple case and avoid running a table scan?
>
>
>
> One problem that I see is that the COUNT function is a UDAF and I am
>
> assuming that the presence of any aggregate function like this is enough
>
> to force the query planner to require a Map/Reduce. Is there anyway
>
> to make the function look like a simple UDF for some queries? Or
>
> just for some tables? I guess that I’d be prepared to sacrifice the full
>
> generality of the normal COUNT function for one which
>
> only functions correctly for the simple query on my tables.
>
>
>
> So is it possible to have a different COUNT function only on certain tables?
>
>
>
> Regards,
>
>
>
> Z
>
>