Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Override COUNT() function


Copy link to this message
-
RE: Override COUNT() function
Peter Marron 2013-07-02, 09:10
Thanks Navis,

This is a very interesting class which I feel pretty sure that I would never have found.
Are  there any descriptions, motivations, documentation or examples anywhere?
I suspect that there's nothing other than the source itself, but I had to ask.

Regards,

Z
-----Original Message-----
From: Navis류승우 [mailto:[EMAIL PROTECTED]]
Sent: 02 July 2013 08:50
To: [EMAIL PROTECTED]
Subject: Re: Override COUNT() function

MetadataOnlyOptimizer changes GBY on partition columns to simple TableScan with one line dummy.

I think similar things can be done with stats.

2013/6/28 Peter Marron <[EMAIL PROTECTED]>:
> Hi,
>
>
>
> I feel sure that someone has asked for this before, but here goes…
>
>
>
> In the case where I have the query
>
>
>
>                 SELECT COUNT(*) FROM table;
>
>
>
> There are many cases where I can determine the count immediately.
>
> (For example if I have run something like:
>
>
>
> ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2],
> ...)] COMPUTE STATISTICS [noscan];
>
>
>
> then there seems to be a table property “numRows” which holds a count
> of the number of rows.
>
> Now I know that the COUNT function can’t always be determined easily.
>
> If the query is more complicated, like
>
>
>
>                 SELECT COUNT(*) FROM table GROUP BY column;
>
>
>
> then obviously a simple scalar count is of no real use. But is there
> some way
>
> to intercept the simple case and avoid running a table scan?
>
>
>
> One problem that I see is that the COUNT function is a UDAF and I am
>
> assuming that the presence of any aggregate function like this is
> enough
>
> to force the query planner to require a Map/Reduce. Is there anyway
>
> to make the function look like a simple UDF for some queries? Or
>
> just for some tables? I guess that I’d be prepared to sacrifice the
> full
>
> generality of the normal COUNT function for one which
>
> only functions correctly for the simple query on my tables.
>
>
>
> So is it possible to have a different COUNT function only on certain tables?
>
>
>
> Regards,
>
>
>
> Z
>
>